zhuang-li / FactualSceneGraph

FACTUAL benchmark dataset, the pre-trained textual scene graph parser trained on FACTUAL.
https://arxiv.org/pdf/2305.17497.pdf
95 stars 12 forks source link

Not able to edit files for the processes of debugging #7

Closed atharvadeore999 closed 1 month ago

atharvadeore999 commented 2 months ago

I am not able to add print statements in src/factual_scene_graph_files for debugging purpose. I can just make changes in the test_evalutor.py file.Please help me by suggesting how can I add print statements in src/factual_scene_graph_files

zhuang-li commented 2 months ago

for parsing results, print at

https://github.com/zhuang-li/FactualSceneGraph/blob/6c5d90c78861a880722f1b8c45fcb2a1827652d1/src/factual_scene_graph/parser/scene_graph_parser.py#L42C1-L42C148

for evaluation results, print at

https://github.com/zhuang-li/FactualSceneGraph/blob/6c5d90c78861a880722f1b8c45fcb2a1827652d1/src/factual_scene_graph/evaluation/evaluator.py#L68

Feel free to ask me questions if you are still having trouble.

atharvadeore999 commented 2 months ago

if i add print statement aat https://github.com/zhuang-li/FactualSceneGraph/blob/6c5d90c78861a880722f1b8c45fcb2a1827652d1/src/factual_scene_graph/evaluation/evaluator.py#L277 it wont get printed.and also if both candidate and reference graphs are ['( cat , in , bag )'] i get a softspice score of 99.99998211860657 .iIf the both candidate and reference graphs are ['( man , surf in , water )'] 95.02027630805969 ,why even if both the graphs are same i get a little low score?

zhuang-li commented 2 months ago

Interesting. I obtained the [100.00001192092896] for both of your graphs. Here is my code.

import pandas as pd
import torch
from factual_scene_graph.evaluation.evaluator import Evaluator
from factual_scene_graph.parser.scene_graph_parser import SceneGraphParser
device = 'mps'
parser = SceneGraphParser('lizhuang144/flan-t5-base-VG-factual-sg', device=device, lemmatize=False)
evaluator = Evaluator(parser=parser, text_encoder_checkpoint='all-MiniLM-L6-v2', device=device, lemmatize=False)
evaluator.evaluate(candidates=['( cat , in , bag )'], references=[['( cat , in , bag )']], method='soft_spice', beam_size=1)
# output: [100.00001192092896]
evaluator.evaluate(candidates=['( man , surf in , water )'], references=[['( man , surf in , water )']], method='soft_spice', beam_size=1)
# output: [100.00001192092896]

The debugging issue might be related to some fundamental Python knowledge. Are you trying to modify the code in the cloned repo? If you install the package using pip install FactualSceneGraph, modifying the code in the local cloned repo won't change the code in the installed package actually. Maybe you could install the local cloned repo via pip install -e . and then put your print code in the local repo? But I am not sure. This is just my guess.

zhuang-li commented 2 months ago

And I see you are using compute_scores(encoded_cands, encoded_refs, cand_lengths, ref_lengths)

This is for calculating dot product between graph embeddings instead of graph strings.

For the soft spice evaluation function, I flattened the graph first so the embedding process can be more efficient. And the cand_lengths, ref_lengths record the original number of items in each graph. The items includes single node, attribute + node, subject node + predicate + object node

atharvadeore999 commented 2 months ago

if possible can you please share your code for this?(For the soft spice evaluation function, I flattened the graph first so the embedding process can be more efficient. And the cand_lengths, ref_lengths record the original number of items in each graph. The items includes single node, attribute + node, subject node + predicate + object node) I will try this out(Maybe you could install the local cloned repo via pip install -e . and then put your print code in the local repo? But I am not sure. This is just my guess.) Thank you so much for your help