nlmatics / llmsherpa

Developer APIs to Accelerate LLM Projects
https://www.nlmatics.com
MIT License
1.17k stars 117 forks source link

How can I check about 'block_class'? #30

Open JinSeoung-Oh opened 7 months ago

JinSeoung-Oh commented 7 months ago

At first, I want to say thank for your great job I was surprised because this API can recognize Korean. Very impressive.

Everything is perfect except specific form of table. So, I want to remove this type of table result from LayoutPDFReader result But I cannot extract specific form of table which LayoutPDFReader is not properly recognized

But I noticed, with 'block_class' I can detect it. So I want to know and check what is 'block_class'.

Thanks and sorry for my pool English

ansukla commented 5 months ago

Hello the parser may not be able to parse some tables correctly and this would need a change in the parser code which we just open sourced: https://github.com/nlmatics/nlm-ingestor