krateng / maloja

Self-hosted music scrobble database to create personal listening statistics and charts
https://maloja.krateng.ch
GNU General Public License v3.0
1.06k stars 63 forks source link

Comma (,) not detected as artist seperator #66

Closed studio-pastimes closed 3 years ago

studio-pastimes commented 3 years ago

As in the title.

Viktor Vaughn feat. Lord Sear, Ben Grimm, Rodan & Louis Logic

results in:

Viktor Vaughn
Lord Sear, Ben Grimm, Rodan
Louis Logic

as 3 seperate artists.

Is this an issue with my tags or something that could be implemented in Maloja? I would be happy to change them within reason, (in the case of artists that are currently billed as Artist 1, Artist 2, I would be willing to change to Artist 1 feat. Artist 2) however I wouldn't want to go with something that is gramatically incorrect such as Viktor Vaughn feat. Lord Sear & Ben Grimm & Rodan & Louis Logic

Great work on this. I'd be happy to take a look if you could point me in the right direction in terms of running this for development (currently learning Python)

FoxxMD commented 3 years ago

I've also had issues with comma as a deliminator when scrobbling -- but also commas are ambiguous in general (What if the artist name is supposed to have a comma in it?)

However Maloja also supports using a forward slash as an artist deliminator. This is what I'm using now, it works much better for multi-artist detection than commas and because its not a common symbol its obvious its not part of the artist name.

Using your title as an example send this in your scrobble payload:

{
  "artist": "Viktor Vaughn / Lord Sear / Ben Grimm / Rodan / Louis Logic"
}
studio-pastimes commented 3 years ago

I am currently using / for some so that's a good shout. I did try and think of any examples of artists with commas in their name and couldn't, however upon your reply I've just thought of loads. Crosby, Stills & Nash for example.

Happy for this to be closed, however as a further point, it could be nice to allow users to specify what acts as a deliminator in their library. The app I use to play my music (Swinsian) has this option in the settings and I've always liked it.

image

FoxxMD commented 3 years ago

Are you developing a client for Swinsian? For my app I found the easiest thing to do was handle (multiple) artist parsing on a client-by-client basis and just re-combine into a string with forward slash delim just before scrobbling.

As for Maloja handling deliminators I believe its actually doing a lot of cleverish parsing to try to break down an artist string into multiple artists as well as it can. There's just always going to be ambiguity, unfortunately. Take a look at cleanup.py where its checking for , ; /, variations of feat and vs, and some other phases -- as well as checking user-supplied rules

studio-pastimes commented 3 years ago

I am (very roughly) working on one now. I have basic scrobbling working but need to handle errors etc. That's another good suggestion in regards to dealing with deliminators in the Swinsian client rather than Maloja itself.

The reason I mentioned adding it into the Maloja core was to allow the user-supplied rules to be used i.e. add an exception for Crosby, Stills & Nash if you wanted to use commas for the rest of your library. My thinking is that if I created a much more rudimentary system in the Swinsian client I wouldn't be able to benefit from the user rules.

How would you handle an artist like Crosby, Stills & Nash in the app you developout of interest? Or does that not cater to commas?

FoxxMD commented 3 years ago

Based on what data each source provides I try to parse out the artist info into an array of artists, then join all strings in the array when building the payload for maloja.

The gold-standard is Spotify where they return artists as an array from their api out of the box. For Crosby, Stills it's not a problem since it is an individual object in their array (so I don't care what the actual artist string looks like)

For the rest of the sources I naively split on commas (lol) because they don't provide any more context than a single string and I'm not a miracle worker. I'd probably just add a rule in Maloja to handle that artist specifically.

However, as I said, even using normalized comma-deliminated artists in a string Maloja was giving me inconsistent results for detecting multiple artists so I just switched to forward slash and it fixed detection pretty much.

krateng commented 3 years ago

I made the delimiters a setting so they can be fully customized. If you do client side parsing, it should also be possible to directly submit separated artist names, but right now this only works via query string - I'll fix this for json payloads as well.