threeplanetssoftware / apple_cloud_notes_parser

Parser for Apple Notes data stored on the Cloud as seen on Apple handsets
MIT License
396 stars 24 forks source link

`ZIDENTIFIER` == `LocalAccount` issue & two convenience patches #59

Closed vszakats closed 1 year ago

vszakats commented 1 year ago
  1. In the HTML output strikethrough style is translated to <del>. This tag means that the text is marked as deleted, and it's not necessarily styled with a strikethrough. According to the MDN. IMO the striketrough font style in Notes may not always mean that, so a better fit could be <s> here, which just means striketrough.

    Patch: [1]

  2. In the JSON output, it'd help processing if we could include a conforms_to (type) value. It would make it unnecessary for external processors to maintain a parallel list of possible types with the same set of properties. A new conforms_to property with image, audio, video, document, inline values would solve this. While looking at it, I've found an old/duplicate elsif branch for conforms_to_document, which can be deleted.

    Patch: [2] (there is probably a more elegant way)

  3. FInally the critical, but fuzzy one: On some systems, the on-disk (--mac <path>) database (a one-to-one copy of the ~/Library/Group Containers/group.com.apple.notes/ tree, Notes version 15), says that the local Note attachments are in LocalAccount (via ZICCLOUDSYNCINGOBJECTZIDENTIFIER). apple_cloud_notes_parser interprets this as any other iCloud UUID and looks for the files inside <path>/Accounts/LocalAccount/{Media,Previews,Thumbnails}, but, there is no such directory on disk. Instead, these are present in bare <path>/{Media,Previews,Thumbnails}. There is no error while running, but the output is different and of course the files output directory is not populated. My solution is to symlink these three directories to where the script expect them to be, and it is also necessary to strip Accounts/LocalAccount/ from paths in the generated output (notably JSON). I'm not sure how to special-case this. The most naïve option is to omit Accounts/ if @note.account.identifier equals to LocalAccount. There are two other records with non-UUID identifiers: TrashFolder-CloudKit and TrashFolder-LocalAccount, though these are not touched by the export.

[1]:

diff --git a/lib/AppleNotesEmbeddedObject.rb b/lib/AppleNotesEmbeddedObject.rb
index 85ed073..8a87ae9 100644
--- a/lib/AppleNotesEmbeddedObject.rb
+++ b/lib/AppleNotesEmbeddedObject.rb
@@ -46,6 +46,22 @@ class AppleNotesEmbeddedObject < AppleCloudKitRecord
       add_cryptographic_settings
     end

+    # Fill conforms_to for convenience
+    tmp_uti = AppleUniformTypeIdentifier.new(@type)
+    if tmp_uti.conforms_to_inline_attachment
+      @conforms_to = "inline"
+    elsif tmp_uti.conforms_to_image
+      @conforms_to = "image"
+    elsif tmp_uti.conforms_to_audiovisual
+      @conforms_to = "video"
+    elsif tmp_uti.conforms_to_audio
+      @conforms_to = "audio"
+    elsif tmp_uti.conforms_to_document
+      @conforms_to = "document"
+    else
+      @conforms_to = "other"
+    end
+
     @logger.debug("Note #{@note.note_id}: Created a new Embedded Object of type #{@type}")

     # Create an Array to hold Thumbnails and add them
@@ -333,12 +349,6 @@ class AppleNotesEmbeddedObject < AppleCloudKitRecord
                                                             row["ZTYPEUTI"],
                                                             note,
                                                             backup)
-          elsif tmp_uti.conforms_to_document
-            tmp_embedded_object = AppleNotesEmbeddedPDF.new(row["Z_PK"],
-                                                            row["ZIDENTIFIER"],
-                                                            row["ZTYPEUTI"],
-                                                            note,
-                                                            backup)
           elsif tmp_uti.uti == "public.url"
             tmp_embedded_object = AppleNotesEmbeddedPublicURL.new(row["Z_PK"],
                                                                   row["ZIDENTIFIER"],
@@ -475,6 +485,7 @@ class AppleNotesEmbeddedObject < AppleCloudKitRecord
     to_return[:note_id] = @note.note_id
     to_return[:uuid] = @uuid
     to_return[:type] = @type
+    to_return[:conforms_to] = @conforms_to
     to_return[:filename] = @filename if (@filename != "")
     to_return[:filepath] = @filepath if (@filepath != "")
     to_return[:backup_location] = @backup_location if @backup_location

[2]:

diff --git a/lib/ProtoPatches.rb b/lib/ProtoPatches.rb
index 837c7cb..e59c1e8 100644
--- a/lib/ProtoPatches.rb
+++ b/lib/ProtoPatches.rb
@@ -227,7 +227,7 @@ class AttributeRun

     # Add in strikethrough
     if strikethrough == 1
-      html += "<del>" if (initial_run or previous_run.strikethrough != 1)
+      html += "<s>" if (initial_run or previous_run.strikethrough != 1)
     end

     # Add in superscript
@@ -315,7 +315,7 @@ class AttributeRun

     # Add in strikethrough
     if strikethrough == 1
-      html += "</del>" if (final_run or next_run.underlined != 1)
+      html += "</s>" if (final_run or next_run.underlined != 1)
     end

     # Add in underlined
threeplanetssoftware commented 1 year ago
  1. Thanks for flagging this. I've always just used <del> on my sites, but I'm fine with changing it to better conform to the spec. (d59288fec6184bea2592cd3c14a6a81c2a8c6790)

  2. I put this logic into AppleUniformTypeIdentifier so that it is in the same place as the conforms_to methods. I also added a few more and made other types explicit even if they're only being added in AppleNotesEmbeddedObject due to being handled separately. Let me know what you think of the bins. (e869efe6fac0927eb9a7c5327c67415765b3a6ec)

  3. I'm going to need to play on my test system to reproduce this. Can you narrow down at all which "some systems" this occurs on? Does it have to do with the MacOS version? I am thinking of a fairly hacky elegant solution, but want to make sure I can effectively test it before I try.

vszakats commented 1 year ago

@threeplanetssoftware Thanks so much for your updates, they look good indeed and worked perfectly.

This LocalAccount is present on macOS Monterey. This installation has been carried over from the times when Notes.app appeared in OS X Mountain Lion, so one suspect is that this has been added by an earlier Notes version and remained there ever since, for the "On My Mac" notes. The group.com.apple.notes.plist Preferences file has two properties that seem related: DidChooseToMigrateLocalAccount = true and DidMigrateLocalAccount = true. Overall, other than being old and used throughout OS versions and multiple machines, I can't report anything particular.

threeplanetssoftware commented 1 year ago

Thanks, I'm looking into this and believe I have a backup from Monterey to check. Can I ask if there is any iCloud account enabled on that version, or is it all purely the LocalAccount?

vszakats commented 1 year ago

Yes, there is also an iCloud account enabled. That one has an UUID and its objects in the expected directories.

threeplanetssoftware commented 1 year ago

Ok, I need to generate some test data, but believe I at least have an appropriate set up to do so. Will push once I have a good solution.

threeplanetssoftware commented 1 year ago

The good news is I have a MacOS computer set up the same as yours and I think I know how I would solve this if it was worth the time investment. The bad news is I can't recreate this issue even with a testbed very close to your setup. My Library/Preferences/group.com.apple.notes.plist has the same settings checked as yours:

{"CloudKitAccountStatus"=>1,
 "ThrottlingPolicyStartTime"=>692587188.601988,
 "DidMigrateLocalAccount"=>true,
 "DidChooseToMigrateLocalAccount"=>true,
 "ThrottlingPolicyCurrentBatchCount"=>0,
 "searchIndexingVersion"=>5,
 "CloudConfigurationPath"=>
  "/System/Library/[snip_long_path]/CloudConfigurations/Normal.plist",
 "DefaultNotesAccount"=>"DeviceLocalAccount",
 "ThrottlingPolicyCurrentLevelIndex"=>0,
 "ClientStateData"=>"[cut]"}

I added a few notes with images and embedded files, made sure that ZICCLOUDSYNCINGOBJECT.ZIDENTIFIER is LocalAccount for the local folder and... a 'LocalAccount' folder showed up in the Accounts folder right where it is supposed to be. It has the expected Previews, Media and FallbackImages folders within it:

[notta@cuppa LocalAccount]$ pwd && ls
/Users/notta/Library/Group Containers/group.com.apple.notes/Accounts/LocalAccount
FallbackImages  Media  Paper  Previews

Admittedly my MacOS version does not go back as far as yours, so this could be an issue faced by people with older installations carrying things forward. I don't like using "Won't Fix," but I'm also not sure how prevalent this issue is. Your symlinks might be the easiest answer, what do you think?

Edit: I'm willing to leave this issue open for a bit with "Need to Reproduce" on it in case others face the same issue and can help us bound how big of an issue it might be.

vszakats commented 1 year ago

Thank your for taking the time going after this.

As for symlinks: Using them means that the backup (or live) folder needs to be written into. This is fragile, messes with timestamps and would fail on a read-only folder. They also cannot be left there, to avoid risking another tool (or Notes.app itself) getting confused by them. For these downsides, I'd happier with a different solution. If you haven't seen another system like this, it must be fairly rare indeed, but they certainly do exist. Here is a screenshot from a 2020 article that resembles the "bare" layout: https://osxdaily.com/2020/01/15/where-notes-stored-locally-mac/ (This one lacks an Accounts folder altogether.)

Peeking into the Notes binary, there are some hints about "modern" and "legacy" LocalAccounts, though without reversing the code, that did not lead anywhere.

What do you think of a solution that, when encountering a LocalAccount UUID, first checks if there is such a directory under Accounts, and if there isn't one, resets the account root to the backup root? This seems fairly safe to do, at least for --mac inputs. Or, a command-line option to explicitly set the LocalAccount root. I can contribute a patch for these.

[ Found two more, possibly related settings via a defaults read dump: LocalNoteMigrationCompleted = 1, MailAccountMigrationCompleted = 1. Now I remember that Notes used to live inside the Mail.app before ML. ]

threeplanetssoftware commented 1 year ago

The legacy refers to when Notes were stored in HTML, whereas modern is the format currently seen. The migration settings you see are also present in NoteStore.sqlite files, but I don't think they are the key here. I suspect it is more order of operations where you had an account that was local before it was iCloud enabled, but I could be wrong. That is likely an edge case that will get more and more rare as time goes on, but given the additional URL you've provided, I'll support this case.

No need to submit a patch, I know how I'll implement it, just need a bit of spare time to do so. My hesitancy was that I'll end up touching (all?) of the embedded objects and will need to do a bit of testing to make sure I don't inject any regression bugs.

I'll leave it in a feature branch and let you know when it is ready so that you can affirm it works in your case, since I can't recreate the test.

Thanks for digging into this.

vszakats commented 1 year ago

Sounds great and thank you for supporting this. Agreed on the circumstances of it, sadly I have no recollection on the order of these past events, but what you wrote is plausible. Possibly Apple wanted to avoid moving around a whole bunch of interconnected files throughout a past upgrade.

I'm happy to make tests (or provide any more info), and looking forward to your updates.

threeplanetssoftware commented 1 year ago

Can you please try the 59-local-account-issue feature branch? I think it should work. The way I implemented this is, upon AppleNotesAccount creation, the AppleNoteStore checks to see if the account's folder exists. If not, it defaults to the folders being in the root of the app. This is not incredibly defensive programming, I don't check to make sure at least one of those folders DOES exist, but if the folders weren't needed, they'd be empty anyhow in the Accounts folder.

Please let me know the results of your tests.

vszakats commented 1 year ago

I just run my tests and can confirm that your update does work correctly on this "odd" Notes folder (after deleting the local symlink hack and the /Accounts/LocalAccount stripping from the external JSON parser). Also thanks for the explanation, this scenario could very well be what actually happened here. Agreed that the extra directory check would be too defensive, depending on how Apple are creating them (on demand, for example.)

TL;DR: All good from my end.

Thanks again for your work on this!

threeplanetssoftware commented 1 year ago

Fixed in fde24c6d2d58b9096c2bd9dc01e6c97b7e8e5165.