Open mrtj opened 7 months ago
Thanks for raising this issue! We are actively improving this. Should be a lot better in the next few releases!
this is a big problem for me as well, if the pdf has a bad or incorrect reading order then it really limits what you can do with it. Even allowing simple templates, or something more manual or through a classifier based on font, textblock positioning, or standard heading titles could really help make this useful.
Hello, the attached information leaflet has a somewhat complex layout with different columns, and llama parse is completely confused about the reading order: xanax-uk.pdf
I would expect to read first the whole upper left column with the title, then the second column on the right, all columns in the first row, then all columns from left to right in the second row. Instead the returned text after the first line of the first column jumps to the second column, back to the first, and messes up completely the sense of the text.