Open dejanmarkovic opened 2 days ago
Are there 2 separate issues here?
1.
pypdf_table_extraction/camelot does not recognize the table on pages after page 1 with the lattice flavor.
This could be a bug.
Have you tried the output with the Network parser?
With this code
`import pypdf_table_extraction
file_path = r"C:\Projects\temp123\attachments\test\er\er3.pdf"
flavors = ["hybrid", "lattice", "network", "stream"]
for flavor in flavors: print(f"\nTrying {flavor} flavor:") try: tables = pypdf_table_extraction.read_pdf( file_path, pages="all", flavor=flavor # Use the current flavor )
print(f"Number of tables found: {len(tables)}")
for i, table in enumerate(tables):
print(f"\nTable {i} data:")
print(table.df)
csv_path = f"{flavor}_table_{i}.csv"
table.df.to_csv(csv_path, index=False)
print(f"Table {i} saved to {csv_path}")
for i, table in enumerate(tables):
print(f"\nParsing report for {flavor} Table {i}:")
print(table.parsing_report)
except Exception as e:
print(f"An error occurred with {flavor} flavor: {str(e)}")
continue
print("\nTable extraction process completed.") ` I am getting the following errors:
NOTE: I have uninstalled the Camelot and pypdf_table_extraction and Installed again only pypdf_table_extraction library so there should be no conflicts or any other issues.
Can you please help/advise?
Based on the following error message:
2. An error occurred with network flavor: Unknown flavor specified. Use either 'lattice' or 'stream'
It looks like somhow you are running an old code base. As of V0.0.2 the error message changed to:
raise NotImplementedError(
"Unknown flavor specified."
" Use either 'lattice', 'stream', 'network' or 'hybrid'"
)
Maybe uninstall both again.
Then reinstall pypdf_table_exctraction.
What is the output of pip show pypdf_table_exctraction
or camelot --version
conda list
beautifulsoup4 4.12.3 pypi_0 pypi bzip2 1.0.8 hcfcfb64_5 conda-forge ca-certificates 2024.8.30 h56e8100_0 conda-forge cachetools 5.5.0 pypi_0 pypi certifi 2024.8.30 pyhd8ed1ab_0 conda-forge cffi 1.17.1 pypi_0 pypi chardet 5.2.0 pypi_0 pypi charset-normalizer 3.4.0 pypi_0 pypi click 8.1.7 pypi_0 pypi colorama 0.4.6 pypi_0 pypi cryptography 43.0.3 pypi_0 pypi cssselect 1.2.0 pypi_0 pypi distro 1.9.0 pypi_0 pypi et-xmlfile 1.1.0 pypi_0 pypi ghostscript 0.7 pypi_0 pypi google 3.0.0 pypi_0 pypi google-api-core 2.19.2 pypi_0 pypi google-api-python-client 2.143.0 pypi_0 pypi google-auth 2.34.0 pypi_0 pypi google-auth-httplib2 0.2.0 pypi_0 pypi google-auth-oauthlib 1.2.1 pypi_0 pypi googleapis-common-protos 1.65.0 pypi_0 pypi httplib2 0.22.0 pypi_0 pypi icu 75.1 he0c23c2_0 conda-forge idna 3.8 pypi_0 pypi libabseil 20240116.2 cxx17_h63175ca_0 conda-forge libexpat 2.6.2 h63175ca_0 conda-forge libffi 3.4.2 h8ffe710_5 conda-forge libprotobuf 4.25.3 h503648d_0 conda-forge libsqlite 3.46.0 h2466b09_0 conda-forge libzlib 1.3.1 h2466b09_1 conda-forge lxml 5.2.2 pypi_0 pypi lz4-c 1.9.4 hcfcfb64_0 conda-forge mysql 9.0.1 h9c18f36_0 conda-forge mysql-client 9.0.1 h809f9c2_0 conda-forge mysql-common 9.0.1 h2224204_0 conda-forge mysql-connector-python 9.0.0 py312h275cf98_0 conda-forge mysql-devel 9.0.1 h2224204_0 conda-forge mysql-libs 9.0.1 h809f9c2_0 conda-forge mysql-server 9.0.1 h63c2bd3_0 conda-forge numpy 2.1.2 pypi_0 pypi oauthlib 3.2.2 pypi_0 pypi opencv-python 4.10.0.84 pypi_0 pypi openpyxl 3.1.5 pypi_0 pypi openssl 3.3.2 h2466b09_0 conda-forge pandas 2.2.3 pypi_0 pypi pdfminer-six 20240706 pypi_0 pypi pdfplumber 0.11.4 pypi_0 pypi pdfquery 0.4.3 pypi_0 pypi pillow 11.0.0 pypi_0 pypi pip 24.0 pyhd8ed1ab_0 conda-forge proto-plus 1.24.0 pypi_0 pypi protobuf 5.28.0 pypi_0 pypi pyasn1 0.6.0 pypi_0 pypi pyasn1-modules 0.4.0 pypi_0 pypi pycparser 2.22 pypi_0 pypi pymupdf 1.24.7 pypi_0 pypi pymupdfb 1.24.6 pypi_0 pypi pypdf 4.3.1 pypi_0 pypi pypdf-table-extraction 0.0.2 pypi_0 pypi pypdf2 2.11.1 pyhd8ed1ab_0 conda-forge pypdfium2 4.30.0 pypi_0 pypi pyquery 2.0.0 pypi_0 pypi python 3.12.4 h889d299_0_cpython conda-forge python-dateutil 2.9.0.post0 pypi_0 pypi python_abi 3.12 4_cp312 conda-forge pytz 2024.2 pypi_0 pypi pyyaml 6.0.2 pypi_0 pypi requests 2.32.3 pypi_0 pypi requests-oauthlib 2.0.0 pypi_0 pypi roman 4.2 pypi_0 pypi rsa 4.9 pypi_0 pypi setuptools 70.1.1 pyhd8ed1ab_0 conda-forge six 1.16.0 pypi_0 pypi soupsieve 2.6 pypi_0 pypi tabula-py 2.9.3 pypi_0 pypi tabulate 0.9.0 pypi_0 pypi tk 8.6.13 h5226925_1 conda-forge tzdata 2024.2 pypi_0 pypi ucrt 10.0.22621.0 h57928b3_0 conda-forge uritemplate 4.1.1 pypi_0 pypi urllib3 2.2.2 pypi_0 pypi vc 14.3 h8a93ad2_20 conda-forge vc14_runtime 14.40.33810 ha82c5b3_20 conda-forge vs2015_runtime 14.40.33810 h3bf8584_20 conda-forge wheel 0.43.0 pyhd8ed1ab_1 conda-forge xz 5.2.6 h8d14728_0 conda-forge zstd 1.5.6 h0ea2cb4_0 conda-forge
pypdf_table_extraction/camelot does not recognize the table on pages after page 1 with the lattice flavor.
With the stream method, I get a messed-up output like this one
This is the output from the lattice from page one which looks great
The document is a PDF bank statement. NOTE: I have randomized the numbers in the output for privacy and security purposes.