Open vadim-160102 opened 4 months ago
I'll try to supply a fix
The fragment from this line: (https://github.com/ssimms/pdfapi2/blob/32af5b86539b92b8e4333168b6ce05244613e72e/lib/PDF/API2.pm#L1275) and till line 1309 is to be replaced with:
sub _transform_box {
my ( $tmatrix, @box ) = @_;
my ( $a, $b, $c, $d, $e, $f ) = @$tmatrix;
@box = ( $a * $box[ 0 ] + $c * $box[ 1 ] + $e,
$b * $box[ 0 ] + $d * $box[ 1 ] + $f,
$a * $box[ 2 ] + $c * $box[ 3 ] + $e,
$b * $box[ 2 ] + $d * $box[ 3 ] + $f );
@box[ 0, 2 ] = @box[ 2, 0 ] if $box[ 2 ] < $box[ 0 ];
@box[ 1, 3 ] = @box[ 3, 1 ] if $box[ 3 ] < $box[ 1 ];
return @box
}
my $trans = '';
if ( not $page-> {' opened' } and not $self-> default( 'nounrotate' )) {
my $rotate = $page-> find_prop( 'Rotate' );
$rotate = $rotate ? $rotate-> val % 360 : 0;
$page-> { 'Rotate' } = PDFNum( 0 ) if $rotate;
my $mb = $page-> find_prop( 'MediaBox' );
my @mb = $mb ? map { $_-> val } $mb-> elements
: ( 0, 0, 612, 792 );
if ( $rotate or $mb[ 0 ] or $mb[ 1 ]) {
my $tmatrix =
$rotate == 0 ? [ 1, 0, 0, 1, -$mb[ 0 ], -$mb[ 1 ]] :
$rotate == 90 ? [ 0, -1, 1, 0, -$mb[ 1 ], $mb[ 2 ]] :
$rotate == 180 ? [ -1, 0, 0, -1, $mb[ 2 ], $mb[ 3 ]] :
[ 0, 1, -1, 0, $mb[ 3 ], -$mb[ 0 ]];
$page-> { 'MediaBox' } = PDFArray( map PDFNum( $_ ),
_transform_box( $tmatrix, @mb ));
for my $mediatype ( qw( CropBox BleedBox TrimBox ArtBox )) {
next unless my $media = $page-> find_prop( $mediatype );
my @box = map { $_-> val } $media-> elements;
$page-> { $mediatype } = PDFArray( map PDFNum( $_ ),
_transform_box( $tmatrix, @box ))
}
$trans = join ' ', map( PDF::API2::Util::float( $_ ), @$tmatrix ), 'cm'
}
}
(And the helper _transform_box()
subroutine definition should better (of course) be moved out of the enclosing open_page()
body). Though the use of $a, $b
is discouraged, I think it's OK for this small scope and helps to easier match with description and formulae in "The PDF Reference". Declaring $trans
with my
assumes line 1255 at the top of the sub only contains $page
declaration, others aren't really required. For further hygiene, line 1314 could use postcondition ( "... if $trans;
"), though improvements/cleaning in general are gargantuan task, and strict focus is only on the bug at hand. The fix above enforces lower left of MediaBox
to become 0, 0
, including the case of 0 degrees rotation. So then 'nounrotate'
becomes misnomer, but it's probably OK because it's no longer publicly documented. I don't know of producers which massively create MB's with other than 0, 0
origin. This (enforced by PDF::API2
nowadays) "unrotation" breaks, in general, files with annotations anyway (the least which comes to mind, and which you are aware perhaps). I wouldn't like to increase this share of broken PDF files with my making of "unrotation" even more intrusive. + Sorry I didn't follow your formatting style preference.
Initial PDF, serialized to string, simulates a file from external source. Boxes (e.g.
CropBox
here) is non-symmetrical relative toMediaBox
, and page was rotated. The content is just chosen to be clearly seen if deformed/cropped. Comment/uncomment the star (*) marked line and inspect the output. This affects opening files and then adding content to a page, as well as importing a page to another document. Gaps between(Crop|Trim|Bleed|Art)Box
andMediaBox
should stay the same on top/left/bottom/right edges of un-rotated page, as they were on rotated page. If I get it right. Looks like this bug is primordial i.e. 20+ y.o. I'll try to supply a fix unless someone comes with better idea.Edit: Actually,
MediaBox
on its own is enough to observe a bug, e.g. replace in the above with:The "un-rotating" code in
PDF/API2.pm
does at least something to boxes boundaries in case of 90 and 270 degrees, and nothing for 180. Here we see that the bug is not because nothing was done for 180, but the whole logic appears to be simplistic and flawed.