Open Coobiw opened 1 month ago
After my own DIY as following code, chartgemma reaches 74.64
on ChartQA(64.0
human + 85.28
aug). Is there any question on my implementation? Thanks for your advice and reply!
def execute_python_code(code):
old_stdout = sys.stdout
new_stdout = io.StringIO()
sys.stdout = new_stdout
status = True
try:
exec(code)
except Exception as e:
status = False
finally:
sys.stdout = old_stdout
if status:
output = new_stdout.getvalue()
else:
output = None
return output, status
response, status = execute_python_code(response)
if status:
answer = response
print(answer)
else:
answer = ""
print("error running...")
Hi @Coobiw
I am cleaning the remaining codebase and will try to release it when I get some time. However, here are some ideas that we used to optimize the performance on the validation set before running the model on the testing set:
Also, we used the following implementation of the evaluation metric: https://github.com/vis-nlp/UniChart/blob/bd6004bc8fe9ef8ce9a6cdfd88712f845d78b918/model/chartqa_model.py#L36
using the following code, it can reach 76.56
(67.84
human + 85.28
aug):
answer = answer.replace("True","Yes").replace("False","No")
answer = answer.strip()
Clean the output string from unnecessary characters (\n, ') Is
strip()
enough to do this?
I will clean up and share the code that you can use to reproduce the results by this weekend. Sorry, I am a bit busy today and tomorrow.
OK! Thanks for your help!!
Hi, thanks for your great work! Will you release your PoT evaluation code or share some details about it on ChartQA test split? I want to reproduce this result. Thanks for your advice and reply!