Open abhishekJadhav opened 1 month ago
I am receiving below error when I am trying to use llmsherpa LayoutPDFReader on my local machine using docker image.
Traceback (most recent call last): File "/app/nlm_ingestor/ingestion_daemon/main.py", line 48, in parse_document returndict, = ingestor_api.ingest_document( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/nlm_ingestor/ingestor/ingestor_api.py", line 37, in ingest_document ingestor = pdf_ingestor.PDFIngestor(doc_location, parse_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/nlm_ingestor/ingestor/pdf_ingestor.py", line 35, in init blocks, _block_texts, _sents, _file_data, result, page_dim, num_pages = parse_blocks( ^^^^^^^^^^^^^ File "/app/nlm_ingestor/ingestor/pdf_ingestor.py", line 176, in parse_blocks parsed_doc = visual_ingestor.Doc(pages, ignore_blocks, render_format) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/nlm_ingestor/ingestor/visual_ingestor/visual_ingestor.py", line 117, in init self.parse(pages) File "/app/nlm_ingestor/ingestor/visual_ingestor/visual_ingestor.py", line 198, in parse p["style"], p.text, page_width ~^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/bs4/element.py", line 1573, in getitem return self.attrs[key]
KeyError: 'style' 172.17.0.1 - - [11/Jul/2024 07:11:00] "POST /api/parseDocument?renderFormat=all HTTP/1.1" 500 -
nlm-ingester issue https://github.com/nlmatics/nlm-ingestor/issues/72
Switching to the docker image mentioned in this comment worked for me.
I am receiving below error when I am trying to use llmsherpa LayoutPDFReader on my local machine using docker image.
Traceback (most recent call last): File "/app/nlm_ingestor/ingestion_daemon/main.py", line 48, in parse_document returndict, = ingestor_api.ingest_document( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/nlm_ingestor/ingestor/ingestor_api.py", line 37, in ingest_document ingestor = pdf_ingestor.PDFIngestor(doc_location, parse_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/nlm_ingestor/ingestor/pdf_ingestor.py", line 35, in init blocks, _block_texts, _sents, _file_data, result, page_dim, num_pages = parse_blocks( ^^^^^^^^^^^^^ File "/app/nlm_ingestor/ingestor/pdf_ingestor.py", line 176, in parse_blocks parsed_doc = visual_ingestor.Doc(pages, ignore_blocks, render_format) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/nlm_ingestor/ingestor/visual_ingestor/visual_ingestor.py", line 117, in init self.parse(pages) File "/app/nlm_ingestor/ingestor/visual_ingestor/visual_ingestor.py", line 198, in parse p["style"], p.text, page_width ~^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/bs4/element.py", line 1573, in getitem return self.attrs[key]