regosen / get_cover_art

Batch cover art downloader and embedder for audio files
MIT License
69 stars 8 forks source link

Improving search results #9

Closed classicjazz closed 3 years ago

classicjazz commented 3 years ago

I ran get_cover_art on my entire music collection and noted the following by comparing error logs with manual searches on Apple Music's web site:

Searches are very literal (and, thus, subject to failure):

regosen commented 3 years ago

Thanks for summarizing your findings. This looks like several different issues / requests rolled into one, so I would recommend the following:

That will help me and others investigate and address these more easily.

classicjazz commented 3 years ago

Thank you. I split it into four issues.

regosen commented 3 years ago

Ok, I just pushed a new version of get_cover_art (v 1.4.4) that should address all these, except roman numerals. That one would be really risky (could cause unintentional matches). I did add a TODO comment in the code about revisiting roman numerals in the future. Let me know what you think.

classicjazz commented 3 years ago

Thank you. With your changes, I re-ran get_cover_art against my iTunes collection and it found over 300 incremental matches.

Note: I believe you need to delete skip_artwork.txt to enable get_cover_art to find new artwork following your code changes.

regosen commented 3 years ago

In the meantime, I took a stab at roman numerals vs regular numbers in this PR (https://github.com/regosen/get_cover_art/pull/13), but it's actually causing previous matches to fail, because the word "I" changes to "1" everywhere, "mix" changes to "1009", etc. It needs to be rethought such that we can search correctly and efficiently.

A. Could search itunes for each variation, but that could result in excessive searches, and not sure how to correctly choose from both result lists. B. Could simply strip out the numerals when searching iTunes altogether, but then you could have too many unrelated matches.

Open to other suggestions.

classicjazz commented 3 years ago

Are you normalizing all artists and album names regardless of whether they match or not?

Assuming so, wouldn't at least part of the issue with Roman numerals (and likely artist names with commas) be solved by only swapping Arabic for Roman numerals or inverting the names if there wasn't a prior match?

Admittedly all of this is tricky because (1.) there is so much variability in naming to begin with; (2.) there is so much bad data from music databases over the years; (3.) Apple's naming isn't necessarily the correct one; (4.) users have their own preferences about music naming, regardless

regosen commented 3 years ago

Ah, that's a good idea- matching without the conversion first, and only using roman numeral conversion as a fallback to minimize the side effects.

Ok, I just deployed a new version with my changes (1.4.5), please re-test at your convenience.

classicjazz commented 3 years ago

There were 13 incremental matches from 1.4.4. to 1.4.5 (after deleting skip_artwork.txt).

regosen commented 3 years ago

That's great! Do you think this issue can be closed out now?

classicjazz commented 3 years ago

Closing this issue. Thank you!