Open qniksefat opened 6 months ago
yep me too, parsing academic documents is really unreliable with any parser currently. If you're trying to use it with academic documents as well then many conferences also have a html format which if you use instead is straight forward to use as an input.
10-K 2023, 09.30.2023-2023-11-02-08-16.json
10-K 2023, 09.30.2023-2023-11-02-08-16.pdf
There is a company that can solve multiple columns of vertical text, and it works particularly well on tables. And his speed is particularly fast, 100 pages <= 5s processing completed
@PowerOwner what is this tool, btw?
Hey,
I'm having a hard time parsing pdf files with two vertical columns filled with text. It actually sometimes captures the right order, but often does not. I'm parsing it into markdown.
For example, it parses one sentence from left and one from the right column. It does not break it between the sentence.
Thanks!