jajuk-team / jajuk

Advanded jukebox for users with large or scattered music collections
49 stars 19 forks source link

Better online cover selection #1671

Open bflorat opened 9 years ago

bflorat commented 9 years ago

Reported by bflorat on 1 Jun 2010 13:54 UTC Idea from Laurent T : We should analyze the cover size returned to sort by aspect ratio similar to CD cover first.

bflorat commented 9 years ago

Commented by mats.ahlgren@gmail.com on 2 Jun 2010 15:59 UTC Duplicate of #1628 ; this feature is included in the pseudocode along with a proposed math formula to define "similar"

Perhaps #1628 should be renamed "Better online cover selection"?

bflorat commented 9 years ago

Commented by bflorat on 9 Jun 2010 20:16 UTC The #1628 is about using others covers websites. I move your pseudo-code here :

From Mats Ahlgren :


Pseudocode for an improved image search:

search name: "Find Album Cover Art"

code:

let tags.title, tags.artist, tags.album = getDataFromTags()

// Remove 'junk' tags that will pollute search
discard any tags T where (T.lowercase() contains 'unknown'+'album', or is 'unknown', or is '')

//(the top image search for 'unknown' is a naked hairy man with a paper bag over his head and erection, and Jajuk loves to give this search result all the time for many songs)

// Do not search if it's meaningless to search
if (tags.album has been discarded),
or if (if album==foldername and !option:use_parent_directory_as_album_name.also_for_cover_art_searching)) { // see issue #1649
    Block all changes to album cover art;
    Fail operation in a "nice" way;
    Display special cover art "Unknown Album";
}

// The above modifications should fix most of the issues I'm having personally, but just to take it a step further...

// Handle - and & and 'feat.'/'feat '/'featuring' and parentheses
For each undiscarded tag, split it into a list if any of the following tokens are encountered: {' - ', ' & '(only in artist tag), 'feat.', 'feat ', 'featuring', '(', ')', '"'}
Go through the split fragment of the string, and ignoring any remaining fragments that are composed only of whitespace or punctuation

e.g. if tags.song=="A Song (feat. Alice & Bob)" -> tags.song=[Song", "Alice", "Bob"]("A)

// Some other music library software and old song tag formats only supported maximum 30 characters.
for all tags SOMETAG:
  if tags.SOMETAG is exactly 30 characters long:
    delete the last fragment in the list tags.SOMETAG

let megalist = {concatenate all fragments}

// How to perform search

megalist.append(["cover"]("album",))
query = {wrap each fragment of megalist in quotes, and concatenate all fragments together}

// Ranking: Penalize images that are high rectangular or low-resolution; no penalty=1 and higher penalty is 0.5, 0.2, 0.1, etc.
resolutionPenalty = min(width,200)*min(height,200)/(200^2)
rectangularPenalty = {let x=min(height/width, width/height), return (x+0.05)^2 if x<0.95, else return 1}

sort images by (10+index)*resolutionPenalty*rectangularPenalty

bflorat commented 9 years ago

Commented by mats.ahlgren@gmail.com on 10 Jun 2010 12:59 UTC Rather than using 200px in calculating resolutionPenalty, one might wish to use 300px.

Or perhaps even better:

1100px^2 -> -80%
900px^2 -> -5%  |
700px^2 -> -0%  | nice and flat here, so pagerank can be dominant
500px^2 -> -5%  |
300px^2 -> -80%

let area = width*height;
let resolutionPenalty = e^(-((sqrt(area)-700px)/400px)^4);
bflorat commented 9 years ago

Commented by mats.ahlgren@gmail.com on 22 Jun 2010 18:02 UTC Addendum:

One probably wants to make multiple searches, at first one with the "larger than 400x300 (qsvga)" GET option set on Google image search, then if the user scrolls past the first 10 matches, results 11-20 will be from a search without the "larger than" option?