optimumchaos / picsort

A Go-based picture sorting tool. Sorts and organizes pictures and videos into your library.
3 stars 0 forks source link

Possible mis-sort due to bad Google metadata related to duplicate filenames #1

Closed optimumchaos closed 3 years ago

optimumchaos commented 3 years ago

Situation:

/folder/fileA.jpg
/folder/fileA.JPG.json
/folder/fileA(1).jpg
/folder/fileA.JPG(1).json

You would expect "fileA.jpg" to match "fileA.JPG.json". I found an instance where "fileA.jpg" corresponds to "fileA.JPG(1).json". Google did not maintain order while exporting the files to disk. If I had done a case-insensitive match, I would have picked the wrong metadata and sorted the picture incorrectly. As it happened, I did a case-sensitive lookup of the metadata file, did not find one, and fell back to the embedded metadata. But the result would be bad in a case-insensitive filesystem, or with the change in place that I just reverted.

Ideas:

  1. prefer embedded dates when present
  2. only use metadata if the embedded dates and metadata agree. (This would be a problem when the file has no metadata.)
  3. only use metadata if there is no (#) situation, implying duplicate files.
optimumchaos commented 3 years ago

Mitigation: https://github.com/optimumchaos/picsort/commit/f4323eaefa87606f1d1713c0e101c1716b72d295

optimumchaos commented 3 years ago

I think the latest commit fixes this.

  1. prefer embedded dates when present: embedded dates were already preferred when present
  2. only use metadata if the embedded dates and metadata agree. (This would be a problem when the file has no metadata.): not doing this. It's not a guaranteed match, and only helps in cases where the metadata isn't that important anyway.
  3. only use metadata if there is no (#) situation, implying duplicate files: applied a check that looks for patterns common to duplication. Because the parentheses seem to come from Google (not from my filesystem... I checked by inspecting the original tar.gz... I put an explicit check for parentheses.)

I think the problem is resolved enough.