Open asleroid opened 3 months ago
Thank you for reaching out.
The PdfFeatures class is inside the pdf-tokens-type-labeler package. You can install this package using the following command
pip install git+https://github.com/huridocs/pdf-tokens-type-labeler@1c12c368887372164ab4981c3277a49e9dc43b9a
Let us know if this solves your problem.
Even though pdf_features is in the installed libraries within venv, running 'pip list' does not return the library.
As a result, when running the following command, the script errors out:
(venv) asleroid@Aslis-MBP pdf_paragraphs_extraction % python src/create_paragraph_extractor_model.py /Users/asleroid/Code/pdf-labeled-data/labeled_data/paragraph_extraction loading one_column_test from /Users/asleroid/Code/pdf-labeled-data/labeled_data/paragraph_extraction/one_column_test Traceback (most recent call last): File "/Users/asleroid/Code/pdf_paragraphs_extraction/src/create_paragraph_extractor_model.py", line 25, in <module> train_model() File "/Users/asleroid/Code/pdf_paragraphs_extraction/src/create_paragraph_extractor_model.py", line 12, in train_model pdf_paragraph_tokens_list = load_labeled_data(PDF_LABELED_DATA_ROOT_PATH) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/asleroid/Code/pdf_paragraphs_extraction/src/paragraph_extraction_trainer/load_labeled_data.py", line 34, in load_labeled_data pdf_paragraph_tokens = PdfParagraphTokens.from_labeled_data(pdf_labeled_data_root_path, dataset, pdf_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/asleroid/Code/pdf_paragraphs_extraction/src/paragraph_extraction_trainer/PdfParagraphTokens.py", line 29, in from_labeled_data pdf_features = PdfFeatures.from_labeled_data(pdf_labeled_data_root_path, dataset, pdf_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/asleroid/Code/pdf_paragraphs_extraction/venv/lib/python3.11/site-packages/pdf_features/PdfFeatures.py", line 126, in from_labeled_data pdf_features.set_token_types(token_type_labels) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'set_token_types'