nlmatics / nlm-ingestor

This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.
https://www.nlmatics.com
Apache License 2.0
1.11k stars 160 forks source link

Update to nlm 2.9.2 v2 #70

Closed jamesvillarrubia closed 4 months ago

jamesvillarrubia commented 5 months ago

Description of the change

Type of change

Testing

The changes have been tested manually to ensure the Tika server works correctly with the new configuration and headers. Additional testing is required for edge cases.

jamesvillarrubia commented 5 months ago

@ansukla As expected, there are some critical fixes in this version that I missed in the first, if you get a sec to a approve. Cheers!

JSv4 commented 4 months ago

FWIW, I've been testing some PDFs that are broken under current main branch. Three of them are working great with these changes.

bzhr commented 4 months ago

Can this get published?

bzhr commented 4 months ago

Very nice, if somebody can provide an example of a docker build command that will build from the main branch, so that I can test, that would be wonderful

Ebraheem-Alrabeea commented 3 months ago

Hi @jamesvillarrubia,

I hope you're doing well! I wanted to ask if the new v2 contains any changes other than those listed in the comparison you posted here. It seems unclear what the changes are inside the v2 Jar. I plan to make some RTL fixes over NLM 2.9.2 and would appreciate your confirmation. Also, what is the Apache Tika version or commit you have used?

Thank you!