Open BainMcKay opened 1 month ago
Found the issue.
BUG: The style parser bug needs to be fixed for the parser to work.
You may need to download the most recent jar file 2.9.2_v2, tika-server-standard-nlm-modified-2.9.2_v2.jar or downgrade to 2.4.1v6. There was a big update to bring nlm-ingestor in line with Apache Tika's most recent updates, but modifications to Tika's jars had to be done too. Bugs were introduced in 2.9.2_v1 regarding the style parser that may be fixed in v2.
Using Python, I am downloading GoogleDrive files to a local server, and caching them in a server tmp folder for failsafe-restart at checkpoint. I load the file into a Document Object which I then parse with semantic parsing. I want to submit the document object to local nlm-ingestor server for processing as well. But If I submit filename and document object, if fails on 404. I don't want to create a publicly available downloads folder on the mlm-ingestor server. Is there a way to submit the document objects, vs the url, to [self.parse_pdf(pdf_file)] in [file_reader.py]?