msiemens / onenote.rs

A Rust OneNote file parser
Mozilla Public License 2.0
52 stars 16 forks source link

parsing failure for pages containing pictures #11

Open flxzt opened 1 year ago

flxzt commented 1 year ago

The latest version of the OneNote web app apparently inserts images without a "last modified" time or the parser parses this property incorrectly, which causes a failure when parsing the downloaded .one file with this crate.

The error I am encountering is: Malformed OneNote file data: image has no last modified time which seems to originate from image_node.rs#L50

It might be better to make this property fall back to the epoch or something like that?

Thanks for writing this crate!

Edit: after applying this fix in my fork I am stumbling over more and more issues. The last modified property doesn't exist/doesn't get parsed correctly for any objects in the file and I am also getting an Malformed OneNote data: paragraph styling is missing Error for rich text objects. Probably more to come when that is resolved.

I am not sure if an incompatible break in the file format happened or if the parsing is not robust enough in certain cases.

msiemens commented 1 year ago

Hey @flxzt, is there a chance you could create an example notebook that exhibits these issues and upload it here?

flxzt commented 1 year ago

Sure, this is a note containing some handwritten strokes, rich text and a Png:

Bildschirmfoto vom 2023-08-15 14-49-26

Schnelle Notizen.zip

einsJannis commented 11 months ago

is this being worked on by someone?