tingxueronghua / ChartLlama-code

MIT License
178 stars 17 forks source link

Are there results compared to Qwen-VL? #2

Closed zengxingchen closed 9 months ago

zengxingchen commented 9 months ago

Personally, I think the main reason for LLaVA's poor performance on ChartQA is the lack of training data, which makes LLaVA unsuitable as a multimodal LLM representative for chart understanding. Why not compare the proposed model with Qwen-VL and show results before and after fine-tuning Qwen-VL. Also, PALI-X shows great performance in ChartQA.

tingxueronghua commented 9 months ago

Good question. I compared more MLLM including Qwen-VL and the results are shown in Table 5 of Appendix. It is apparent that Qwen-VL performs well on ChartQA, however, their performances are much lower when facing charts in special types.

I do not want to emphasize that their performances are worse than ChartLlama. In fact, I totally agree with you that the poor performance of a series of MLLMs on ChartQA is due to the lack of corresponding training data. And the method of how to construct such data is the main contribution of our work.

tingxueronghua commented 9 months ago

Personally, I think the main reason for LLaVA's poor performance on ChartQA is the lack of training data, which makes LLaVA unsuitable as a multimodal LLM representative for chart understanding. Why not compare the proposed model with Qwen-VL and show results before and after fine-tuning Qwen-VL. Also, PALI-X shows great performance in ChartQA.

I notice that you asked why not fine-tune Qwen-VL. In fact, as the experiment results shows, I think Qwen-VL has already been trained on ChartQA. And I do not think it would be a good idea to fine-tune it again, which might harm the ability of language understanding. (And I am not sure about the best way to further fine-tune Qwen-VL on specific data source.) Besides PALI-X, there are also other works showing impressive performances on ChartQA. For example, combining chart-extracting methods and online LLM could achieve much higher performances, for example, DePlot with Codex. But we focus on the MLLMs which could infer end-to-end on our machine, and it would be more convenient to test the performances on other datasets.

tingxueronghua commented 9 months ago

I think the questions are well answered :)

Feel free to re-open this issue if you have any further questions.