nlmatics / llmsherpa

Developer APIs to Accelerate LLM Projects
https://www.nlmatics.com
MIT License
1.15k stars 113 forks source link

Get repeated headers and footers #48

Open simon-cerda opened 5 months ago

simon-cerda commented 5 months ago

Hi, is there a way to get the footers and repeated headers? maybe on a different object or as an option.

ansukla commented 5 months ago

It is not available via the API. The headers and footers are stripped here: https://github.com/nlmatics/nlm-ingestor/blob/main/nlm_ingestor/ingestor/visual_ingestor/visual_ingestor.py#L369. You can retain what is stripped and then return it via the API -- it is a bit of work, but if you are interested in putting together a PR, happy to review.