ssimms / pdfapi2

Create, modify, and examine PDF files in Perl
Other
15 stars 20 forks source link

Page boundaries should be adjusted after content is "un-rotated" on open_page() #84

Open vadim-160102 opened 1 month ago

vadim-160102 commented 1 month ago
use strict;
use warnings;
use feature 'say';
use PDF::API2;

my $pdf = PDF::API2-> new;

my $p = $pdf-> page;
$p-> boundaries( 
    media => [ 0, 0, 600, 400 ], 
    crop  => [ 50, 50, 600, 400 ],
);
$p-> rotation( 180 );

my $f = $pdf-> font( 'Times-Roman' );
$p-> text
  -> font( $f, 96 )
  -> position( 100, 300 )
  -> text( 'Hello, PDF!' );

$pdf = PDF::API2-> from_string( $pdf-> to_string );

$pdf-> open_page( 1 );          # no-op? nope   (*)

$pdf-> save( 'visual_test.pdf' );

Initial PDF, serialized to string, simulates a file from external source. Boxes (e.g. CropBox here) is non-symmetrical relative to MediaBox, and page was rotated. The content is just chosen to be clearly seen if deformed/cropped. Comment/uncomment the star (*) marked line and inspect the output. This affects opening files and then adding content to a page, as well as importing a page to another document. Gaps between (Crop|Trim|Bleed|Art)Box and MediaBox should stay the same on top/left/bottom/right edges of un-rotated page, as they were on rotated page. If I get it right. Looks like this bug is primordial i.e. 20+ y.o. I'll try to supply a fix unless someone comes with better idea.

Edit: Actually, MediaBox on its own is enough to observe a bug, e.g. replace in the above with:

my $p = $pdf-> page;
$p-> boundaries( 
    media => [ 50, 50, 600, 400 ], 
);
$p-> rotation( 90 );

The "un-rotating" code in PDF/API2.pm does at least something to boxes boundaries in case of 90 and 270 degrees, and nothing for 180. Here we see that the bug is not because nothing was done for 180, but the whole logic appears to be simplistic and flawed.

vadim-160102 commented 1 month ago

I'll try to supply a fix

The fragment from this line: (https://github.com/ssimms/pdfapi2/blob/32af5b86539b92b8e4333168b6ce05244613e72e/lib/PDF/API2.pm#L1275) and till line 1309 is to be replaced with:

        sub _transform_box {
            my ( $tmatrix, @box ) = @_;
            my ( $a, $b, $c, $d, $e, $f ) = @$tmatrix;

            @box = ( $a * $box[ 0 ] + $c * $box[ 1 ] + $e,
                     $b * $box[ 0 ] + $d * $box[ 1 ] + $f,
                     $a * $box[ 2 ] + $c * $box[ 3 ] + $e,
                     $b * $box[ 2 ] + $d * $box[ 3 ] + $f );

            @box[ 0, 2 ] = @box[ 2, 0 ] if $box[ 2 ] < $box[ 0 ];
            @box[ 1, 3 ] = @box[ 3, 1 ] if $box[ 3 ] < $box[ 1 ];

            return @box
        }

        my $trans = '';

        if ( not $page-> {' opened' } and not $self-> default( 'nounrotate' )) {
            my $rotate = $page-> find_prop( 'Rotate' );
            $rotate = $rotate ? $rotate-> val % 360 : 0;
            $page-> { 'Rotate' } = PDFNum( 0 ) if $rotate;

            my $mb = $page-> find_prop( 'MediaBox' );
            my @mb = $mb ? map { $_-> val } $mb-> elements
                         : ( 0, 0, 612, 792 );

            if ( $rotate or $mb[ 0 ] or $mb[ 1 ]) {
                my $tmatrix = 
                    $rotate ==   0 ? [  1,  0,  0,  1, -$mb[ 0 ], -$mb[ 1 ]] :
                    $rotate ==  90 ? [  0, -1,  1,  0, -$mb[ 1 ],  $mb[ 2 ]] :
                    $rotate == 180 ? [ -1,  0,  0, -1,  $mb[ 2 ],  $mb[ 3 ]] :
                                     [  0,  1, -1,  0,  $mb[ 3 ], -$mb[ 0 ]];

                $page-> { 'MediaBox' } = PDFArray( map PDFNum( $_ ), 
                    _transform_box( $tmatrix, @mb ));

                for my $mediatype ( qw( CropBox BleedBox TrimBox ArtBox )) {
                    next unless my $media = $page-> find_prop( $mediatype );
                    my @box = map { $_-> val } $media-> elements;
                    $page-> { $mediatype } = PDFArray( map PDFNum( $_ ), 
                        _transform_box( $tmatrix, @box ))
                }
                $trans = join ' ', map( PDF::API2::Util::float( $_ ), @$tmatrix ), 'cm'
            }
        }

(And the helper _transform_box() subroutine definition should better (of course) be moved out of the enclosing open_page() body). Though the use of $a, $b is discouraged, I think it's OK for this small scope and helps to easier match with description and formulae in "The PDF Reference". Declaring $trans with my assumes line 1255 at the top of the sub only contains $page declaration, others aren't really required. For further hygiene, line 1314 could use postcondition ( "... if $trans;"), though improvements/cleaning in general are gargantuan task, and strict focus is only on the bug at hand. The fix above enforces lower left of MediaBox to become 0, 0, including the case of 0 degrees rotation. So then 'nounrotate' becomes misnomer, but it's probably OK because it's no longer publicly documented. I don't know of producers which massively create MB's with other than 0, 0 origin. This (enforced by PDF::API2 nowadays) "unrotation" breaks, in general, files with annotations anyway (the least which comes to mind, and which you are aware perhaps). I wouldn't like to increase this share of broken PDF files with my making of "unrotation" even more intrusive. + Sorry I didn't follow your formatting style preference.