Open harshkhandeparkar opened 1 year ago
I'm working with to use tabula-py as the alternate library to fix this and pandas to export as excel
@shikharish
@anuraganand92 Go ahead! And do share your progress. How is tabula working on the pdf?
I discarded tabula as it wasn't that good or fast enough for parsing, I tried pdfplumber which was similar to camelot. I attempted it on test.pdf, but i am not sure if the parsing format in test.xls is the correct one, because some cells have multiple entries or different arrangement of entries test.xlsx
Yes, I myself tried tabula, pdfplumber and a few others. None of them were as good as camelot. If we can't find an alternative, forking and updating camelot seems like the only option.
The PDF library used to read timetables, camelot-py, only supports Python versions 3.6, 3.7, and 3.8. Support for Python 3.10+ would be mandatory in a year since 3.8 will stop receiving security updates in Oct 2024.
Possible solutions: