threeplanetssoftware / apple_cloud_notes_parser

Parser for Apple Notes data stored on the Cloud as seen on Apple handsets
MIT License
396 stars 24 forks source link

Bugs in --individual-files: URI-escaping filenames, un-merging accounts, properly referencing media #81

Closed FilipLaz closed 1 year ago

FilipLaz commented 1 year ago

Describe the bug Links to notes that have # in names are not encoded properly

To Reproduce Have notes some of them with # in name

  1. Export notes with --individual-files flag turned on
  2. Open index file
  3. Try to open some of the notes with # in the name
  4. See error

Expected behavior '#' character is encoded to '%23' and note html is opened

Actual behaviour '#' character is not encoded and url is not working

Example Actual link to file file:///Users/user/projects/apple_cloud_notes_parser/output/2023_08_11-01_23_27/html/note_store1/Hall%20of%20Fame/63318%20-%20#testTag%20i%20#testTag2 Expected link to file file:///Users/user/projects/apple_cloud_notes_parser/output/2023_08_11-01_23_27/html/note_store1/Hall%20of%20Fame/63318%20-%20%23testTag%20i%20%23testTag2

Command used ruby notes_cloud_ripper.rb --individual-files -r -m ...

Please confirm the following

threeplanetssoftware commented 1 year ago

Thanks for reporting this. I took a quick peek and noticed some other issues with --individual-files. It also looks like it isn't taking into account that multiple accounts can have the same folder name, so it overwrites the index.html in a duplicate folder with just the notes from the last account.

I had hoped for a quick fix, but I'll need to poke at this a bit and see if there are any other issues to roll in.

FilipLaz commented 1 year ago

👍 I noticed one more bug, nested folder attachment links are wrong, or better said they are offset-ed by their parent. Example level1Folder > level2Folder > exampleNote

Expected attachment link file:///Users/userx/projects/apple_cloud_notes_parser/output/2023_08_11-01_23_27/html/note_store1/files/Accounts/7DD9R21EC-EFD0-481A-AD7C-E2DB4203A584/Previews/37960CA8-8BAC-4E05-9DE9-95408B5C2155-1-1024x576-0.png

Actual attachment link file:///Users/userx/projects/apple_cloud_notes_parser/output/DOBAR%202023_08_11-01_23_27/html/note_store1/**level1Folder**/files/Accounts/7DD9R21EC-EFD0-481A-AD7C-E2DB4203A584/Previews/37960CA8-8BAC-4E05-9DE9-95408B5C2155-1-1024x576-0.png

Let me know if I should open a new bug. Thank you.

threeplanetssoftware commented 1 year ago

No, I'll just add it to this one and turn it into a thread for --individual-file bugs. Thanks for flagging.

threeplanetssoftware commented 1 year ago

76c0ce97a69bd63e5bdf24881117315726936d13 should fix the first two issues, please let me know if it doesn't. I'll work on properly re-basing the media paths tomorrow, hopefully.

FilipLaz commented 1 year ago

not working, all links are now broken not only ones with special characters

actual file:///Users/userX/projects/apple_cloud_notes_parser/output/2023_08_12-14_31_26/html/note_store1/iCloud-FolderNameFooBar/iCloud-FolderNameFooBar%2F92615%20-%20%23unitedkingdom.html

expected file:///Users/userX/projects/apple_cloud_notes_parser/output/2023_08_12-14_31_26/html/note_store1/iCloud-FolderNameFooBar/92615%20-%20%23unitedkingdom.html

Probable cause: account name space exists only in the root, it shouldn't be added to urls for nested folders

threeplanetssoftware commented 1 year ago

The account name is prefixed to the root folder, it isn't used as a folder itself, although I suppose that could also work. My browser doesn't care that the slashes in the URL are escaped and opens the right file, is yours saying the file cannot be found? I could escape each portion separately, if need be.

FilipLaz commented 1 year ago

There are several new issues, Links to subfolders are broken, in all browsers, chrome/safari/firefox Last "/" is encoded and it shouldn't be - it's not part of the name but part of the path to it image

<a href="Obsidian%2Findex.html">Obsidian</a> --> <a href="Obsidian/index.html">Obsidian</a>

FilipLaz commented 1 year ago

In subfolders, links contain duplicate of account name + folder, much easier to demonstrate with a screenshot Left is after the update and not working Right is before the update and working(links to notes with special characters are not working) image

threeplanetssoftware commented 1 year ago

Ok, I just pushed a commit that escapes each portion of the path separately. Try that, please?

FilipLaz commented 1 year ago

🙌 Works, links are working - thank you. Only thing left is relative paths for media - other than that I don't see any other issues.

threeplanetssoftware commented 1 year ago

Didn't quite get to the media links today and will learn a lesson from last night about not slinging code late at night after a long day. Will let you know when I push a fix for the media links! Thank you for your patience.

FilipLaz commented 1 year ago

Thank you - please don't feel pressured by me even the slightest.

threeplanetssoftware commented 1 year ago

I clearly didn't feel pressured, just finally pushed a fix for this as I was toying with a few different solutions. Check out 04372b481fd44f51f82c066137cdefb5483c266e and see if it performs better. If so, I'll close.

FilipLaz commented 1 year ago

Works - to a large degree. It fails for sub nested folders, attachment path has extra folders in path, Current: file:///Users/userX/projects/apple_cloud_notes_parser/output/2023_08_19-20_34_04/html/note_store1/files/Accounts/99B921EC-EFD0-AD7C-E2DB4203A584/Previews/778FBCEB-74AA-4636-8C75-64DF8238AEB5-1-1024x500-0.png Expected: file:///Users/userX/projects/apple_cloud_notes_parser/output/2023_08_19-20_34_04/files/Accounts/99B921EC-EFD0-AD7C-E2DB4203A584/Previews/778FBCEB-74AA-4636-8C75-64DF8238AEB5-1-1024x500-0.png

Additional question, Doesn't it make more sense for 'files' folder to be under html instead at the root?

threeplanetssoftware commented 1 year ago

Right you are, I missed passing individual_files back into the recursive call for to_relative_root. Should be fixed now.

As far as 'files' is concerned, that isn't strictly for HTML. This package was originally written from a forensics perspective and the 'files' folder is really where just files that are extracted from the application go. These are also referenced in the CSV files and JSON exports. It just happens that a lot of people now use this also as a tool for backing up Notes, because there aren't a ton of great alternatives, so I try to support that use case as well.

FilipLaz commented 1 year ago

Works. Thank you 👏 Issues described here are all fixed, I do have other small quirks that make it cumbersome to import notes to obsidian, but that requires a new ticket.

threeplanetssoftware commented 1 year ago

Great, glad to hear it! Thanks for flagging this issue.