Open Niedzwiedzw opened 3 years ago
@Niedzwiedzw which approach are you using? I will try to give you instructions on how to get the relevant information without leaking the sensitive data tomorrow.
I've switched to master branch to be able to use named enum-style Ops, but now it doesn't load the document at all
thread 'parser::lotos::test_parser::test_example_files_parse' panicked at 'bad page?:
Try { file: "/home/niedzwiedz/.cargo/git/checkouts/pdf-3ef1c528a9b91eec/9e56f00/pdf/src/file.rs", line: 94, column: 19, source:
Try { file: "/home/niedzwiedz/.cargo/git/checkouts/pdf-3ef1c528a9b91eec/9e56f00/pdf/src/object/types.rs", line: 22, column: 42,
source: FromPrimitive {
typ: "Option < Content >",
field: "contents",
source: TryContext { file: "/home/niedzwiedz/.cargo
/git/checkouts/pdf-3ef1c528a9b91eec/9e56f00/pdf/src/content.rs",
line: 237, column: 21, context: [("op.as_str()", "Ok(\"BI\")")],
source: MissingEntry { typ: "InlineImage", field: "ColorSpace" } } } } }',
invoices/src/parser.rs:155:47
oh wow. an inline image. Will look into that as well.
I'm not creating the documents, and I can imagine the standard compliance for pdf is a MESS. for some context, I'm trying to salvage what I can from some government generated documents :D
@s3bk https://github.com/sbeckeriv/lopdf/blob/master/src/nom_parser.rs would this be useful to you at all?
I don't think we are going to switch to nom. It is great, but PDF is a mess and we already have a handwritten parser.
The PDF Reference lists ColorSpace
as a non-optional field of inline images.
And I have no intention of allowing various derivations from the specification as that is a hole without bottom.
@Niedzwiedzw you are in luck. The color_space
field is an Option
, so I went ahead and made it optional in inline images.
so cool thank you so much @s3bk
thread 'parser::lotos::test_parser::test_example_files_parse' panicked at 'bad page?:
Try { file: "/home/niedzwiedz/.cargo/git/checkouts/pdf-3ef1c528a9b91eec/d09d20e/pdf/src/file.rs",
line: 94, column: 19, source:
Try { file: "/home/niedzwiedz/.cargo/git/checkouts/pdf-3ef1c528a9b91eec/d09d20e/pdf/src/object/types.rs",
line: 22, column: 42, source: FromPrimitive { typ: "Option < Content >", field: "contents", source:
TryContext { file: "/home/niedzwiedz/.cargo/git/checkouts/pdf-3ef1c528a9b91eec/d09d20e/pdf/src/content.rs", line: 236,
column: 21, context: [("op.as_str()", "Ok(\"BI\")")], source: MissingEntry { typ: "InlineImage", field: "Decode" } } } } }',
invoices/src/parser.rs:155:47
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
hmm
I'm unable to provide an example pdf cause it contains sensitive data though :(