pdf_table_extraction.ipynb notebook (dem) error

os-climate / aicoe-osc-demo

This repository is the central location for the demos the ET data science team is developing within the OS-Climate project. This demo shows how to use the tools provided by Open Data Hub (ODH) running on the Operate First cluster to perform ETL, create training and inference pipelines.

Apache License 2.0

10 stars 24 forks source link

pdf_table_extraction.ipynb notebook (dem) error #241

Open ashuYB opened 1 year ago

ashuYB commented 1 year ago

Hello team,

I am trying to execute the demo notebook (pdf_data_extraction) and 'am getting an error while importing:

from src.data.s3_communication import S3Communication

ImportError: cannot import name 'PDFTableExtractor' from 'src.components.preprocessing' (/opt/app-root/lib64/python3.8/site-packages/src/components/preprocessing/init.py)

I was speaking with Ryan Day and he suggested that I get your counsel.

Thanks Ashu

PS: I have been following teh instructions in https://github.com/os-climate/aicoe-osc-demo/blob/master/README.md

HeatherAck commented 1 year ago

@Shreyanand are the instructions still accurate and branch accurate

Shreyanand commented 1 year ago

@HeatherAck The instructions are accurate but since we are not using the NLP models for table extraction, it is not updated and has errors. The pdf text extraction and other notebooks in the inference.pipeline and training.pipeline should work fine.

ashuYB commented 1 year ago

Hello team, @Shreyanand and I made progress but the pdf_text_extraction notebook doesn't create the extracted output in the ~/data/extraction