Questions about performance metrics on Slake and PathVQA datasets.

The F1 scores you mentioned for Slake in the paper are 85.24% and 86.1%. Could you please clarify the difference between these two values? Additionally, in your paper, you mentioned that the overall F1 is a weighted sum of F1 scores for each class. Since I cannot obtain the results for the test set's classification numbers on the Slake and PVQA datasets, could you explain how you derived the overall score of 56.9 from the scores of 28.0 in closed questions and 88.0 in open questions for PathVQA during development?

taokz / BiomedGPT

Questions about performance metrics on Slake and PathVQA datasets. #12