Open justinschoeman opened 4 months ago
We are encountering a large number of files in the wild with junk at the end (usually html from buggy download pages).
The current open() function in Basic/PDF/File.pm stops after the first 1kB.
The below change continues all the way to the beginning of the file (in a horribly inefficient way - 1k sliding window), but it seems to work:
#foreach my $offset (1..64) { # $fh->seek($end - 16 * $offset, 0); # $fh->read($buffer, 16 * $offset); # last if $buffer =~ m/startxref($cr|\s*)\d+($cr|\s*)\%\%eof.*?/i; #} my $scan_length = 16; my $scan_start = $end - $scan_length; for(;;) { $fh->seek($scan_start, 0); $fh->read($buffer, $scan_length); last if $buffer =~ m/startxref($cr|\s*)\d+($cr|\s*)\%\%eof.*?/i; last if $scan_start < 16; $scan_start -= 16; if($scan_length < 1024) { $scan_length += 16; } }
Actually, start with $scan_length = 32. The initial 16 is pointless.
my $scan_length = 32;
We are encountering a large number of files in the wild with junk at the end (usually html from buggy download pages).
The current open() function in Basic/PDF/File.pm stops after the first 1kB.
The below change continues all the way to the beginning of the file (in a horribly inefficient way - 1k sliding window), but it seems to work: