optimumchaos / picsort

A Go-based picture sorting tool. Sorts and organizes pictures and videos into your library.
3 stars 0 forks source link

Google Takeout is Truncating Filenames #4

Open optimumchaos opened 5 months ago

optimumchaos commented 5 months ago

Example:

Picsort is finding and sorting the HEIC, presumably because it matches the metadata file 1:1.

Suddenly the reason for issue # 3 is clear. Google appears to be restricting filenames to 51 characters. The original filename is "...A1DA5.", which Google truncates to "...A1.HEIC", but "...A1D.MP4" since the extension is shorter. Not cool, Google.

This also brings to light something that might be important: each metadata file has a "title" tag which apparently matches the filename by convention, but may not actually match the filename. We should be treating "title" as source of truth (even if it is un-true) and then addressing the truncation problem somehow.

optimumchaos commented 5 months ago

I'm finding this referenced elsewhere as well, e.g.: https://www.reddit.com/r/googlephotos/comments/lsfafv/google_takeout_of_google_photos_observations_as/

optimumchaos commented 5 months ago

I'm thinking we should:

  1. Stop looking for the ".json" files as neighbouring files with a similar name. Instead do an initial pass, scanning for all JSON files in the import data, reading their titles, and mapping the JSON metadata to <file-directory>/<truncated-title-tag>.
    1. The truncated title tag is the title tag truncated to 51 characters with un-truncated file extension present. 51 is the default, but I can override it on the command line.
    2. Put the non-truncated file name (minus extension) in the metadata structure.
    3. The metadata mapping should be a list of metadata in case we got more than one match.
  2. When attempting to deduplicate with metadata:
    1. Get the filename on disk and truncate to 51 characters in the same way as mentioned above.
    2. Get the metadata list if present. If there is more than one in the list, search for an exact filename match. If there is an exact match, use it. Otherwise consider it "unsupported".
    3. If using Live Photo support flag, try swapping out each of the known Live Photo extensions and repeating this step from the top.