Open ESultanik opened 4 years ago
Hi @ESultanik, I’ve been promoting something similar (a “heat map” of file reads) for a while… I see the following benefits:
Hi Peter, thank you for the feedback.
Could you point me in the direction of a PDF that is malformed in a "fixable" way, and to a parser that will automatically perform this repair?
Sure (thinking of something like no xref) - do you have a 'short-list' of parsers you prefer?
Yes!
Parser short-list: MuPDF, QPDF, and Poppler.
Thank you.
polytracker.zip I hand-made some samples for you by hex-editing down a PDF: no xref, no startxref, no trailer, etc. Filenames are descriptive of the malform. QPDF certainly outputs different warning messages so you should be able to capture via PolyTracker the additional recovery mechanisms that fire. MuPDF/poppler also supports some (but not all!) of these malforms. I also provided the baseline PDF too (...-original.pdf) to make 'diff-ing' the processing easier for you. Let me know if you want more samples.
Now that we maintain temporal information for when specific bytes are operated on, it would be interesting (although perhaps not useful) to visualize it as an animated GIF.