Closed lijinginfo closed 1 month ago
Hi @lijinginfo! Thanks for your interest and sorry it took so long to get to this. I was waiting for us to release our latest report so I can share more solid evidence as to why MINT-1T (HTML) is closer to OBELICS.
One thing I will say is that the results in v1 paper were flawed in a few ways: 1) we did not ablate demonstration separators for the in-context learning prompts. 2) We did not include vqav2 evals in the "a" plot.
We reran results for MINT-1T (HTML) and OBELICS using both an XGen-MM and Idefics2 architectures and find that performance is very close. Here are some numbers:
We also reran experiments for the HTML + PDF from MINT-1T and now see much bigger performance gap! (Note this is just for xgen-mm architecture experiments)
Thanks!
Closing this but feel free to reopen if you have any more questions!
Why is OBELICS generally better than MINT-1T (HTML)? Is the main advantage of MINT-1T over OBELICS primarily related to PDFs?