Closed jojo2357 closed 1 month ago
Thanks for your report. According to section 7.9.4 of the PDF 2.0 specification, your datetime format does not follow the standard (ISO 8824-1).
I am not sure whether providing support for any sort of date format really makes sense - there are tons of variations.
As a first step, you might want to get in touch with the author/creator of the PDF file to inform them of the standard violation. Otherwise, you should still be able to implement your own logic based upon creation_date_raw
if the exception is raised.
Perhaps adding a parameter like "fallback datetime formats" would work? that way i dont need to completely re-implement the whole method if I know I might have a bad format.
In the meantime I did just copy the source and am using the raw with my extra format.
you could just import dateutil:
dateutil.parser.parse(reader.metadata.creation_date_raw)
I'm personnally not inclined to add a dependency on this library to cope with invalid formats. For the same reason adding a format parameter does not seem a good idea as you have an easy solution to cope with your issue
In the meantime I did just copy the source and am using the raw with my extra format.
Catching the exception is still a valid approach which does not require copying the whole function. As the function is internal, simply adding a fallback datetime format does not really work.
As already mentioned, the format violates the specification to quite some extent and simple workarounds are already possible here, thus I am going to close this issue as not planned.
Environment
Code + PDF
This is a minimal, complete example that shows the issue:
The original PDF that I discovered having this stupid format contains sensitive information and was big. So instead I re-created it with the
LaTeX
below. I verified that the data is exactly the same between the original problem file and the generated one.main.pdf
Traceback
This is the complete traceback I see:
Thoughts
While this date is obviously not the correct format, it would be nice if other formats were checked for automatically for me just in case someone sends me a borked PDF.