threeplanetssoftware / apple_cloud_notes_parser

Parser for Apple Notes data stored on the Cloud as seen on Apple handsets
MIT License
410 stars 26 forks source link

Thumbnails Occasionally Missing on MacOS Backups #95

Closed huyz closed 7 months ago

huyz commented 8 months ago

ZTYPEUTI to add: [such as "com.apple.thingy"]

I don't know why, but I'm getting empty-string ZTYPEUTI:

 is unrecognized ZTYPEUTI, please submit a bug report to this project's GitHub repo to report this: https://github.com/threeplanetssoftware/apple_cloud_notes_parser/issues
 is unrecognized ZTYPEUTI, please submit a bug report to this project's GitHub repo to report this: https://github.com/threeplanetssoftware/apple_cloud_notes_parser/issues
 is unrecognized ZTYPEUTI, please submit a bug report to this project's GitHub repo to report this: https://github.com/threeplanetssoftware/apple_cloud_notes_parser/issues
 is unrecognized ZTYPEUTI, please submit a bug report to this project's GitHub repo to report this: https://github.com/threeplanetssoftware/apple_cloud_notes_parser/issues
 is unrecognized ZTYPEUTI, please submit a bug report to this project's GitHub repo to report this: https://github.com/threeplanetssoftware/apple_cloud_notes_parser/issues
 is unrecognized ZTYPEUTI, please submit a bug report to this project's GitHub repo to report this: https://github.com/threeplanetssoftware/apple_cloud_notes_parser/issues
 is unrecognized ZTYPEUTI, please submit a bug report to this project's GitHub repo to report this: https://github.com/threeplanetssoftware/apple_cloud_notes_parser/issues
 is unrecognized ZTYPEUTI, please submit a bug report to this project's GitHub repo to report this: https://github.com/threeplanetssoftware/apple_cloud_notes_parser/issues
 is unrecognized ZTYPEUTI, please submit a bug report to this project's GitHub repo to report this: https://github.com/threeplanetssoftware/apple_cloud_notes_parser/issues
 is unrecognized ZTYPEUTI, please submit a bug report to this project's GitHub repo to report this: https://github.com/threeplanetssoftware/apple_cloud_notes_parser/issues
 is unrecognized ZTYPEUTI, please submit a bug report to this project's GitHub repo to report this: https://github.com/threeplanetssoftware/apple_cloud_notes_parser/issues
 is unrecognized ZTYPEUTI, please submit a bug report to this project's GitHub repo to report this: https://github.com/threeplanetssoftware/apple_cloud_notes_parser/issues

Anticipated type of file: [Such as "image", "document", etc]

threeplanetssoftware commented 8 months ago

Thanks for reporting this. Without more information I would guess it is some symptom of the specific NoteStore.sqlite you are looking at. Often I have found the culprit for these odd missing fields to be items that were deleted, but never completely cleaned up.

If you look in debug_log.txt you should see lines like:

is unrecognized ZTYPEUTI1, check ZICCLOUDSYNCINGOBJECT Z_PK: [a number]

You can use something like SQLite Browser to see if those rows in ZICCLOUDSYNCINGOBJECT look substantially different from other embedded objects. In addition, I would check to make sure the note doesn't appear deleted (i.e. for the device you're looking at, can you still see that note?)

I just pushed an update to the logger code (bb157cfaf3fef32f23ceb8a567d86fd2f1ae7074) which will prepend Note: [note_id] to the front of those log lines so that you can look for any patterns with the Note being the same, for example.

D, [2024-03-04T05:08:23.024230 #42302] DEBUG -- : Note 24:  is unrecognized ZTYPEUTI, check ZICCLOUDSYNCINGOBJECT Z_PK: 23
huyz commented 8 months ago

Thanks for the debugging help.

Yes they're all for the same note, one written by someone else who shares the folder with me.

And the note is in the Recently Deleted folder but it looks like it was probably deleted back in 2023-08 (doesn't show in the GUI).

The export shows:

Embedded Object : 5932A86D-288D-4D83-B16D-750353A3BC6D
Embedded Object : 3D5AC920-726D-4854-B9BF-631C187997A7
Embedded Object : D68C948B-D235-4C07-A05B-1057A8FB8BE6

Embedded Object : D086440E-64E1-4E43-ACCD-928B4444FC86

Embedded Object : E74795F2-EBF1-48D9-A2D1-F90F68114F25
Embedded Object : 80BD4A97-B1DC-46A6-9F03-37F7B2440FAB
Embedded Object : B987E847-D282-4351-BD35-D4F06FD453AA
Embedded Object : A2DF5EDF-7B28-4A60-9FB9-E67CAE0E58ED
Embedded Object : 646F4AC2-C84C-47FB-8FA4-286251B0CE55
Embedded Object : 917D4E48-828F-4492-B3BD-7E7AF1475A53
Embedded Object : 3D4D1635-C1F3-4E34-B945-ACAA41607092

Embedded Object : E08C6911-6EEA-43BB-937C-030E8787F9A1

Now, here may be a new issue: interestingly it looks like a copy of this note was made around the same day in 2023-08 and not edited in any way that I can tell. That copy is live and and has PDF attachments (from the built-in Document scanner), which do work fine in the Apple Notes, but do not show up in the export:

{Image missing due to not having file reference point}{Image missing due to not having file reference point}{Image missing due to not having file reference point}{Image missing due to not having file reference point}

{Image missing due to not having file reference point}

{Image missing due to not having file reference point}{Image missing due to not having file reference point}{Image missing due to not having file reference point}

{Image missing due to not having file reference point}{Image missing due to not having file reference point}

{Image missing due to not having file reference point}

{Image missing due to not having file reference point}{Image missing due to not having file reference point}{Image missing due to not having file reference point}

{Image missing due to not having file reference point}

{Image missing due to not having file reference point}

{Image missing due to not having file reference point}{Image missing due to not having file reference point}

{Image missing due to not having file reference point}{Image missing due to not having file reference point}{Image missing due to not having file reference point}{Image missing due to not having file reference point}

{Image missing due to not having file reference point}

{Image missing due to not having file reference point}{Image missing due to not having file reference point}{Image missing due to not having file reference point}{Image missing due to not having file reference point}

This is the same number of lines: 12, one for each attachment. Multiple {Image…} on the same line seem to represent different JPEG thumbnail previews for each page.

Other notes with PDFs do work.

Since the PDFs show up in Apple Notes, but don't in the export, I assume this is a bug?

threeplanetssoftware commented 8 months ago

Thanks for the additional information. If the note is deleted, I don't think I can shed any further light on the blank ZTYPEUTI. I've seen all manner of combinations of columns that were kept or deleted on rows which slipped past the auto-nuke date in the recycle bin, especially if there is sharing going on.

To make sure I understand the second issue: There is another note (which happened to start as a copy of this deleted one), which is not deleted today. It is in a folder shared to you by another user. In (I assume based on the other issue) Apple Notes on the Mac you can see this note and it has 12 PDF attachments which work fine. However, when you export the database using --mac, you're not seeing PDFs, but these Image missing warnings. Is that correct? Was the note created by you or the other user?

Appreciate your help narrowing down how to reproduce this.

huyz commented 8 months ago

Thanks for the additional information. If the note is deleted, I don't think I can shed any further light on the blank ZTYPEUTI. I've seen all manner of combinations of columns that were kept or deleted on rows which slipped past the auto-nuke date in the recycle bin, especially if there is sharing going on.

Makes sense. But would it be possible to detect that it's in the "Recently Deleted" folder and in that case add some additional info on the log line that says something to the effect of "this may be expected as this note is deleted"? Even better, can notes that slip past the auto-nuke date of 30 days be detected and get special treatment (either more info on the log lines or have the severity be downgraded)?

Actually if these past-auto-nuke deleted notes can be detected, maybe they can be set apart in the export itself as well. Right now it's a bit confusing that the "Recently deleted" section has lots of notes, which doesn't reflect what we see in Apple Notes.

To make sure I understand the second issue: There is another note (which happened to start as a copy of this deleted one), which is not deleted today. It is in a folder shared to you by another user. In (I assume based on the other issue) Apple Notes on the Mac you can see this note and it has 12 PDF attachments which work fine. However, when you export the database using --mac, you're not seeing PDFs, but these Image missing warnings. Is that correct? Was the note created by you or the other user?

Mostly correct, but I just realized that that other live note is probably the original (created by the other user). And the deleted note was the copy that I created and promptly deleted. Everything else is correct. This note is live and looks perfectly fine so the export doesn't reflect the correct state of attachments.

threeplanetssoftware commented 8 months ago

Right now it's a bit confusing that the "Recently deleted" section has lots of notes, which doesn't reflect what we see in Apple Notes.

It might not reflect what is seen in Apple Notes to the eyeball, but it does reflect what is seen in the database. This project started as a forensic tool after a SANS course many years ago and I try to be faithful to represent what is available. I'm also super happy it also works as a backup tool for a lot of people (perhaps the majority of users), but recreating the view a user sees on their iPhone or Mac takes a secondary spot to showing what is available.

Makes sense. But would it be possible to detect that it's in the "Recently Deleted" folder and in that case add some additional info on the log line that says something to the effect of "this may be expected as this note is deleted"?

I can look into adding that. Notes in the "Recently Deleted" folder for the first 30 days should be recoverable, it is the 40 day window after that (and beyond) that become the question mark. I don't currently have a good iTunes backup that includes data in that window and it will take time (or messing with clock settings) to try to get a good set to test with.

I hope this doesn't sound like I'm punting the issue, but I've spent a lot of time running down things that amount to "It looks like this got deleted, but only partially" and I essentially treat it like UB behavior in C at this point. I'll make a best effort to fix obvious bugs, but because I keep a strict black box approach (I don't look at any Apple source, I don't decompile, I do everything just with the database exports from my own devices) it gets very hard to troubleshoot specifics.

This note is live and looks perfectly fine so the export doesn't reflect the correct state of attachments.

I'm thinking this may have to do with which account was doing the sharing. I'll try to reproduce it with my devices.

huyz commented 8 months ago

It might not reflect what is seen in Apple Notes to the eyeball, but it does reflect what is seen in the database. This project started as a forensic tool after a SANS course many years ago and I try to be faithful to represent what is available. I'm also super happy it also works as a backup tool for a lot of people (perhaps the majority of users), but recreating the view a user sees on their iPhone or Mac takes a secondary spot to showing what is available.

Ah interesting. Yes, I perfectly understand. I guess the DB doesn't expose a timestamp at which the note was deleted (i.e. moved into the Recently Deleted folder) then? If it did, then maybe you could expose that in the exported HTML in the index under the "Recently Deleted" section such that that the reader could eyeball and see which ones are past the 30-40 day threshold. This way your code wouldn't need to differentiate but the human could :) Regardless this is a very minor feature request—not worth the time unless it's trivial.

it will take time (or messing with clock settings) to try to get a good set to test with.

No worries, take your time.

I keep a strict black box approach

Very interesting approach. Well, I appreciate your strictness and attention to detail. It's what gives me confidence in relying on your tool for backups. I have seen waaaay too many export/backup tools that hand-wave, mostly without warning, and skip over what they don't expect—pretty much eliminating any confidence in these tools. So your tool and your efforts are awesome.

I'm thinking this may have to do with which account was doing the sharing. I'll try to reproduce it with my devices.

If you can't repro, when I have time, I can try running the debugger.

threeplanetssoftware commented 8 months ago

If you can't repro, when I have time, I can try running the debugger.

I appreciate the kind words. If you have a chance to poke at it before I can reproduce it, my gut guess is that the Thumbnail extensions are messed up. iOS 17 changed the defaults and I haven't robustly tested what happens if someone on iOS 16 (or below) creates a note and shares it with someone on iOS 17. Do you know if whomever shared it is one the same version as you? I have an iOS 16 device and iOS 17 device, but both are on the same account.

huyz commented 8 months ago

I'm trying to debug this but I've been running into another blocker (not all the time, for some reason, but most of the time):

/usr/local/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/sqlite3-1.7.2-arm64-darwin/lib/sqlite3/database.rb:177:in `initialize': no such table: ZACCOUNT (SQLite3::SQLException)
    from /usr/local/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/sqlite3-1.7.2-arm64-darwin/lib/sqlite3/database.rb:177:in `new'
    from /usr/local/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/sqlite3-1.7.2-arm64-darwin/lib/sqlite3/database.rb:177:in `prepare'
    from /usr/local/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/sqlite3-1.7.2-arm64-darwin/lib/sqlite3/database.rb:221:in `execute'
    from /usr/local/git/apple_cloud_notes_parser/lib/AppleNoteStore.rb:398:in `rip_accounts'
    from /usr/local/git/apple_cloud_notes_parser/lib/AppleNoteStore.rb:230:in `rip_all_objects'
    from /usr/local/git/apple_cloud_notes_parser/lib/AppleBackup.rb:243:in `block in rip_notes'
    from /usr/local/git/apple_cloud_notes_parser/lib/AppleBackup.rb:240:in `each'
    from /usr/local/git/apple_cloud_notes_parser/lib/AppleBackup.rb:240:in `rip_notes'
    from notes_cloud_ripper.rb:194:in `<main>'

When I look at the DB, I see that there are only a few tables, none of them being ZACCOUNT. I do see that ZICCLOUDSYNCINGOBJECT does have ZACCOUNT as a column. You would know better, but these things and the intermittency suggest to me that the original tables are in some sort of iCloud-synced tables now? Everything looks fine when I open macOS Apple Notes.

Here are the SQL tables I have:

screenshot 2024-03-10T131328Z

Any idea what's going on?

I started having this problem on 2024-03-03. I'm on macOS 14.4 23E214 (Sonoma) MacBookPro18,2 (Apple M1 Max, arm64)

huyz commented 8 months ago

Ah I know what's going on.

not all the time, for some reason, but most of the time

It turns out when the program incorrectly guesses "Guessed Notes Version: 8" instead of the current "Guessed Notes Version: 17", that's when I get that immediate crash.

Well I can certainly work around that for now…

huyz commented 8 months ago

Now that I'm able to get further, I noticed that the live note is now no longer completely full of {Image missing due to not having file reference point}. It's actually partial now:

screenshot 2024-03-10T205137Z

And now that I'm searching for {Image missing due to not having file reference point}, I see that this is problem is happening to lots of notes

threeplanetssoftware commented 8 months ago

The original version of Notes (which is the one with ZACCOUNT as a table name), to my knowledge does not support iCloud syncing. That is one reason why the database schemas are fairly different, it has to support inter-mixing updates from various devices nicely.

I would be interested to know more about the database that is "incorrectly" guessed as an original Notes database (the iOS Version 8 is actually usually a default meaning "I can't guess the right version, this is legacy"). What is the command you are running to start this? If you run this 5 times, how many times does it "incorrectly" guess that database? Is the database open and locked when you are running this?

I have pretty extensive copies of the NotesStore.sqlite database from versions between 12 and 17 and the version fingerprinting is something I'm fairly confident in, so I suspect something else is going on (improper file permissions, something else?) if you believe it to be an up-to-date database.

threeplanetssoftware commented 8 months ago

Also, thank you for giving me the time frame it became a problem. Would you mind checking out some older commits, such as aa3ecf4f720452aa6d51264437a3096654906c36, to see if it is still an issue. I have reviewed the commits between that and now and don't believe they would explain a regression, but want to make sure. Appreciate your help trying to reproduce this reliably!

huyz commented 8 months ago

I tried restoring a couple of versions of my Apple Notes DB (from my CCC backups) as close to when I got the Notes Version mismatches, but I can't reproduce the errors. I tried undoing the https://github.com/threeplanetssoftware/apple_cloud_notes_parser/commit/aa3ecf4f720452aa6d51264437a3096654906c36 and that didn't make a difference.

I think I have to wait until it happens again in my daily backups and then launch my debugger right away.

So stay tuned…

threeplanetssoftware commented 8 months ago

Ok, I'll leave this issue open for a bit to see if you can reproduce it. I will say that thumbnails had a fairly massive (in my opinion) change in iOS 17 so it wouldn't surprise me in the least to find out there is some race condition going on between when your device syncs the update from iCloud and updates the record.

huyz commented 8 months ago

iOS 17 changed the defaults and I haven't robustly tested what happens if someone on iOS 16 (or below) creates a note and shares it with someone on iOS 17. Do you know if whomever shared it is one the same version as you? I have an iOS 16 device and iOS 17 device, but both are on the same account.

I forgot to reply. My counterpart upgraded to 17.1.x or so. I think I upgraded a few weeks later. A lot of these notes were created before that though.

huyz commented 8 months ago

Ok, I'll leave this issue open for a bit to see if you can reproduce it

False alarm on the "Guessed Notes version 8" issue. It turns out that I shouldn't have stopped at the exception—I should have looked at the logs which basically had an error that said that the SQLite DB couldn't be copied. I was running my backups from cron and I needed to give the top-level binary Full Disk Access. I used to have granted it and then I accidentally removed the permission.

Ok, so back to the current issue, which is missing thumbnails. Do you need me to provide more info for that?

threeplanetssoftware commented 8 months ago

I've been digging into the Thumbnail code to clean it up. It has always been a shakier part of the code base. I have already fixed a few potential issues and am working on some others. Once I push those changes live I'll see if they work and, if not, try to come up with some queries to run against the NoteStore to help debug it.

Thanks for your patience!

threeplanetssoftware commented 8 months ago

Brief update: I've been working through thumbnail code and I suspect the problem isn't the code itself, nor even necessarily an iOS version difference (16 v 17).

For context, MOST of my testing and use of Notes is done on an iPhone (ripped via a Mac). I test the Mac build occasionally, but until iOS 17, it looked like any features released on iOS were also released on MacOS at the same time. With the change in iOS of image extensions (.jpg became .jpeg, z_generations were added, etc), I had assumed the current Mac version would behave the same. In fact, I even see z_generation in the database and comparing the iPhone and MacOS databases for the same Cloud-synced notes everything looks like it lines up pretty well.

However, files on disk are clearly different. My MacOS files are still using iOS 16 file extensions, not using the z_generation, etc. I will dig in a bit more, but it might be the case that they have diverged. I have an idea for a solution that I was considering a while back that I considered inelegant (check ALL the possible variants of file paths used by notes), but will need a bit to get that to somewhere I am happy with it.

For now, if you also have an iPhone, I would advocate doing a phone backup and pulling from that, rather than the Mac backup directly. My apologies for the annoyance here.

huyz commented 8 months ago

Ah that's odd. No need for apologies and no rush at all. I'll wait until you find a solution, no worries.

threeplanetssoftware commented 7 months ago

I've updated to the title to better reflect what I am reproducing, please let me know if this is not representative of your example.

So far it seems like at least two things are not behaving consistently (at least in my data set):

I suspect the latter might have been what generated your slew of errors because until recently I was relying on the first thumbnail's location to derive the image in HTML. That is now changed to taking the first thumbnail with a proper location and, if one does not exist, then using the location of the full-size image. I probably could put a bound on the image size.

For the former issue, I am now linking to the thumbnail's location if the full-size image can't be found.

This is not a solution that I love and I plan to keep trying to figure out how best to differentiate these cases, but hopefully this is usable in the meantime for you. Please let me know what other specific cases you are seeing so that I can try to reproduce.

huyz commented 7 months ago

The title fits.

That seems to handle all my cases: the change has fixed all the missing links as I no longer have any {Image missing due to not having file reference point} anywhere.

A couple of discrepancies still (not critical):

Anyway, not critical at all. A bound on the image size to resemble the other thumbnails would be sufficient for me.

Thanks a lot!

threeplanetssoftware commented 7 months ago

Thanks for pointing those out, I'll take a look. Right now I've opted to redo how versioning is tracked so that Mac versions can be handled explicitly differently. I just pushed a79d2ff7c3f3454679b1649b816ba110145441a6 that will set the width of the full-size image to the first thumbnail, even if it can't find the first thumbnail. This hopefully will solve the last issue.

I'll poke at cropping information and if I can see the right order. Is it something obvious like the order is just reversed? Or are they a bit more random than that?

huyz commented 7 months ago

I just pushed https://github.com/threeplanetssoftware/apple_cloud_notes_parser/commit/a79d2ff7c3f3454679b1649b816ba110145441a6 that will set the width of the full-size image to the first thumbnail, even if it can't find the first thumbnail. This hopefully will solve the last issue.

Yes it worked. Thanks!

Is it something obvious like the order is just reversed? Or are they a bit more random than that?

It looks fairly random. The one pattern that I can discern from looking at Apple Notes is that, if there are missing thumbnails within a row, it's always in the latter part of the row (as depicted in order in Apple Notes); i.e. in a row, once a thumbnail is missing for a page, then the rest of the pages on that row will have missing thumbnails. But as displayed by your app, it's fairly mixed: real thumbnails and faux thumbnails are mixed in a seemingly random order. And within each category (real or faux), it's fairly mixed as well. I can sort of see a tendency for the fake thumbnails to be in the same order as they are in Apple Notes, but not always.

threeplanetssoftware commented 7 months ago

Please forgive the delay on this one. I think I found where in the Gallery protobuf the ordering information is kept. It is working on my simpler test data, but I'd love to find out if it works for your more complex example. Can you check out 14646d82d2b747e3a728518ed18e887e71aa52e6 and tell me how the order looks? I pray it is correct for you.

huyz commented 7 months ago

It worked! awesome work as usual.

threeplanetssoftware commented 7 months ago

Great! In that case I'm going to close this issue because the root issues seem to be mitigated. I'll investigate cropping the images on the fly if the thumbnail can't be found, but that could start to bog down some users with lots of galleries not using the --individual-files option since it would all have to be done in the browser as the images are displayed, so I'm not incredibly inclined to build that in completely.

I appreciate your patience as I worked through this bug!

huyz commented 7 months ago

Makes perfect sense. This is great