mikehaertl / php-pdftk

A PDF conversion and form utility based on pdftk
MIT License
952 stars 128 forks source link

Pdf::updateInfo() -- make it understand the output of Pdf::getData() #292

Closed rotdrop closed 2 years ago

rotdrop commented 2 years ago

See #291

mikehaertl commented 2 years ago

I had not time to study your changes but I think it should be much easier: In the constructor right before the conversion logic you could check if $data is an instance of InfoFields. Then you can get the original output from dump_data by casting it to a string ((string) $data) and simply skip the conversion part and write the data string to the temp file.

A change to the conversion logic should not be required.

rotdrop commented 2 years ago

I had not time to study your changes but I think it should be much easier: In the constructor right before the conversion logic you could check if $data is an instance of InfoFields. Then you can get the original output from dump_data by casting it to a string ((string) $data) and simply skip the conversion part and write the data string to the temp file.

A change to the conversion logic should not be required.

I do not agree (although I realized that I can work around it). Why then is it that updateInfo() accepts this InfoFile input at all? Why not the entire part of the array which is provided by getData()? Also: changing the formatted array data of getData() is somewhat easier then fiddling with the string output.

Anyhow: the purpose of the PR was just to make the usage of getData() and updateInfo() more convenient. It felt incomplete.

Thanks again for your work, I'm just doing these PRs and comment because I'm now using it. Just take them as comments and suggestions.

mikehaertl commented 2 years ago

Why then is it that updateInfo() accepts this InfoFile input at all?

It's pdfk that needs an infofile for update_info. As PHP arrays are usually the most convenient format to work with we used this as input for updateInfo() - and use the InfoFile class to create the (temporary) infofile for pdftk from that array data.

Why not the entire part of the array which is provided by getData()?

The entire array is nothing else than the parsed output of dump_data. That's what the InfoFields class does. So both, array and string, should be absolutely equivalent. And the InfoFields instance keeps a copy of the original output from dump_data which you can get via (string) $data (or $data->__toString()).

Also: changing the formatted array data of getData() is somewhat easier then fiddling with the string output.

Well, you pointed out that pdftk expects the input for update_info to be in the same format as it returns from dump_data.

I agree that this would make sense. I missed that part in the man page. But if that is true then we really want to create a temporay infofile with the original unmodified string as it is returned by dump_data. There should be no conversion required whatsoever.

If you're still not convinced I could also try a quick implementation so you could try it. Would you find time to test it then?

EDIT: Changed typo. And just to make this more clear again: The two classes are complements:

mikehaertl commented 2 years ago

I'm sorry, forget what I wrote above.

I didn't remember that we somehow grouped the output of getData() into sub arrays with keys like Info, etc.:

        "Info" => array(
            "CreationDate" => "D:20140709121536+02'00'",
            "Creator" => "Writer",
            "Producer" => "LibreOffice 4.2"
        ),
        "PdfID0" => "8b93f76a0b28b720d0dee9a6eb2a780a",
        "PdfID1" => "8b93f76a0b28b720d0dee9a6eb2a780a",
        "NumberOfPages" => "5",
        "PageMedia" => array(
            array(
                "Number" => "1",
                "Rotation" => "0",
                "Rect" => "0 0 595 842",
                "Dimensions" => "595 842"
            ),

It's true, that you can't feed this into updateInfo(). And moreover as I see it the current implemenation only allows to set values from the Info section - nothing else. This sounds like yet another issue. So there are a couple of different issues here.

But before we continue here, maybe you could first provide an example what you actually try to achieve.

rotdrop commented 2 years ago

Well, I actually want to add bookmarks, it is more convenient to add those to the data array provided by getData(), so I then changed the InfoFile class to not only understand the 'Info' part of the array provided by getData() but just the entire output, including the bookmark data.

A little bit more context: while coding an app for that cloud-software "Nextcloud" I try to generate a single-file PDF from a file-system tree:

This way the bookmark structure reflects the original folder structure. Existing bookmarks are kept (luckily pdftk's "cat" command keeps and adjust already existing bookmarks)

While doing this I just use the file and directory names as bookmark titles.Those may contain non-ASCII characters (see #294)

All this works quite nicely with the proposed changes to your php-pdftk. It would be nice if you could accept the merge request. Of course, my other option is to just derive a sub-class and apply the changes there and just live with the current state of php-pdftk

mikehaertl commented 2 years ago

My main problem is really that your changes break BC (see #291).

But in any case it would be good if you could include a test case for your change in tests/InfoFileTest.php. Preferably as addition to the existing test in tests/InfoFileTest (which will break now unless we find a solution to keep BC).

It's much easier to discuss changes if we can see what is the expected input/output.

mikehaertl commented 2 years ago

Closing this in favour of #297