patronus-ai / financebench

87 stars 14 forks source link

About format of doc #3

Closed zhouzihao501 closed 4 months ago

zhouzihao501 commented 4 months ago

Hi, when you input the doc in model like GPT4, what's the format of doc(markdown?) and what's tool you use to transform pdf to this format?

zhouzihao501 commented 4 months ago

thanks!

ninodimontalcino commented 4 months ago

Hi @zhouzihao501,

thanks for your interest in FinanceBench! We've just released an evaluation playground script where you can see how we parse PDFs and call LLMs with the relevant texts. For parsing, we use pymupdf -- note that different parsers may result in different results as tables are differently parsed. Please let us know if you have any further questions.