Closed zahrarsl closed 4 hours ago
Can you upload this pdf file?We will fix it as soon as possible.
10.1016@j.is.2016.03.004.pdf This file is sent as an example. These cases have been observed in several places. Is it because of the new version? How accurate is the model?
We have noticed issues with reading order and character loss in areas dense with formulas in text-based PDFs. We will thoroughly fix this problem in the next version.
Thank you very much.
We are pleased that our new code performed well in this sample. In the coming days, we will release a new version to thoroughly address this issue.
You are really great.
10.1016@j.aci.2014.05.001_origin.pdf I'm sorry. In this file, there is a series of algorithm sections, some of which recognize the image and some others do not and read as text. I suggest you consider this in the new version. thanks
fixed
Description of the bug | 错误描述
I converted a number of pdf files to markdown files with this method, but there were some errors in all these files. The order of the text is not observed. Like the photos below.
How to reproduce the bug | 如何复现
I expected the text inside the markdown file to be exactly the same as the pdf file.
Operating system | 操作系统
Windows
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
0.9.x
Device mode | 设备模式
cuda