nlmatics / nlm-ingestor

This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.
https://www.nlmatics.com
Apache License 2.0
1.11k stars 160 forks source link

How to handle PPT format? #11

Open mengmeng0320 opened 9 months ago

mengmeng0320 commented 9 months ago

Hello, after converting PPT to PDF and using layoutPDFReader for parsing, the results are not satisfactory. How can I directly perform structural analysis on PPT?

ansukla commented 9 months ago

You can send the PPT directly to the API. PPTs are the hardest to handle as the graphics generated by PDF writer doesn't write in in a logical order and content in mulit column layout.