threeplanetssoftware / apple_cloud_notes_parser

Parser for Apple Notes data stored on the Cloud as seen on Apple handsets
MIT License
411 stars 26 forks source link

Feature discussion: Diff-able exports #109

Open forthrin opened 7 months ago

forthrin commented 7 months ago

It would be really nice if exported data was on a format so that diffs could be easily performed between two dumps.

For example, a JSON format organized as outlined below, with:

Notes
   Daily
       Pick up groceries

       Had a funny thought when waking up this morning.
       Blah blah
   Weekly
Home
  Kitchen
  Bedroom
Later
  Holiday
  ...

This can probably be achieved by some HTML parsing today, but like to hear your thoughts. The current JSON format retains the whole HTML data in a single item, which doesn't utilize the capabilities of JSON data.

The HTML output could also be even more similar to how they appear in the original Notes, so that if one does a copy-paste from them into Pages, they would appear exactly like if doing copy-paste from Notes to Pages.

threeplanetssoftware commented 7 months ago

That's an interesting thought. I'm wondering if the CSV output would allow this better than JSON. That sort of guarantees one row per entry for each of them (although I haven't put much new stuff into CSV lately, so it might be lacking a few fields).

For JSON, it does include the note plaintext, which should already have the formatting removed. The JSON isn't trying to break down paragraphs and the like, more of represent each object (i.e. the note has subordinated objects for each embedded image, etc). I'm not incredibly keen to significantly change output until I'm done with the test refactoring, but this might be something that can be easily answered.

I'll give this a thought to see if there's an easy way to make something diff-able (or at least a link to something that would answer it). Please see if the plaintext field would do what you're looking for. Thanks for bringing up the thought.

forthrin commented 7 months ago

Suggested JSON because it could be easily parsed with code, but maybe Markdown would be better, or even HTML (if formatted on a smart line-by-line basis), as they would allow an immediate diff with intended results without preprocessing.

Side note: TBH, something like Markdown is what Apple should have used in the first place. Keep content and formatting separate. And don't use binary formats. And don't zip things down (until you have to send them over a network, anyway). For all their simplicity on the surface, what is found below is often really messy. Oh, well...