Closed divergentdave closed 10 years ago
This is ready to go now, let me know what you think.
Fixed a merge conflict caused by #87, and this is good to go. Thanks, @divergentdave!
This exciting new CreationDate
format from a Dept of Education IG report caused a crash:
CreationDate: D:00000101000000Z
ModDate: Tue Sep 25 07:26:39 2001
The crash comes from the fact that if the time format fails, my_datetime
hasn't been defined after the time parse attempts:
File "/home/unitedstates/inspectors-general/inspectors/utils/utils.py", line 171, in parse_pdf_datetime
if my_datetime:
UnboundLocalError: local variable 'my_datetime' referenced before assignment
I fixed the crash in https://github.com/unitedstates/inspectors-general/commit/f98044d62d3b0472aa7d99a41a895d05d3e9e73d, but am re-opening to see what you make of this new format -- I have no idea how to parse it, or if it's invalid.
Er, well, can't re-open a PR, but I'm going to assume it's a bug in the metadata, and should just be ignored.
Whoops, good catch. Dates in PDFs are supposed to follow ASN.1 apparently,
but of course tons of them don't. The D: prefix is normal, but the rest of
it specifies midnight, Universal Time, January 1, 0000, so I'm going to
assume it's garbage. pdfinfo
is already doing some formatting for PDFs
that have valid ASN.1 format dates, so they are human-readable by the time
they get to us.
On Sun, Jul 27, 2014 at 9:42 PM, Eric Mill notifications@github.com wrote:
Er, well, can't re-open a PR, but I'm going to assume it's a bug in the metadata, and should just be ignored.
— Reply to this email directly or view it on GitHub https://github.com/unitedstates/inspectors-general/pull/88#issuecomment-50295498 .
WIP, still testing this. Addresses #76. There will probably be a merge conflict with #84.