Closed smallzhao closed 4 months ago
Your PDF is created unusually in that almost every line is a separate text block. This confuses the column identification algorithm currently.
I have developed a fix which will be published with the next version.
Partly solved in version 0.0.6.
The solution solves some problems but it is like with table recognition: There will always be cases that escape a complete detection.
0B3168BDCDA63212BC25EDF6681AE1EF.pdf src_pdf:
dst_md:
I use pymupdf4llm==0.0.5, and I cannot separate the two columns of the PDF. The above is the code and file I used, as well as the generated results. Do I need to configure other parameters to achieve the effect of separating the two columns?