listing all artists: artists found only in compilations albums

owntone / owntone-server

Linux/FreeBSD DAAP (iTunes) and MPD audio server with support for AirPlay 1 and 2 speakers (multiroom), Apple Remote (and compatibles), Chromecast, Spotify and internet radio.

https://owntone.github.io/owntone-server

GNU General Public License v2.0

2.07k stars 235 forks source link

listing all artists: artists found only in compilations albums #642

Open whatdoineed2do opened 5 years ago

whatdoineed2do commented 5 years ago

edit: restate problem

currently if we have an artist (db artist) track that exists in a compilation album (db album_artist == Various Artists for example) then this track is NOT shown for the artist (listings are based on album_artist's hash songartistid) or searchable (searches are on db album_artist). Furthermore, if the artist in the Various Artist/compilation does NOT have an album of their own, this artist is NOT shown from the db_build_query_group_artists() or searchable.

The problem is 2 fold:

the db does not treat track artists as unique entities for display/searching
compilations are not grouped, leading to album views showing the same album multiple times (1x per each track artist)

Before I dig a little deeper I want to confirm if the behaviour below is expected.

I have an compilation album that has multiple artists but in most cases I dont have any other tracks belonging to this artist: artist tag == real artist name (ie ACDC) and album artist == "Various Artists".

On the web ui's artist page (http://.../3689/#/music/artists) there is NO reference to the artist ACDC - rather this artist's track is buried under "Various Artist". I would expect all artists to be listed.

In the jsonapi, we get artists names via the dbgri->itemname:

 115) static json_object *
 116) artist_to_json(struct db_group_info *dbgri)
...
 126)   safe_json_add_string(item, "name", dbgri->itemname);
 127)   safe_json_add_string(item, "name_sort", dbgri->itemname_sort);

The offending file's metadata as pulled via ffmpeg:

;FFMETADATA1
artist=ACDC
album=The Best Driving Album
title=Back in Black
genre=Pop
track=1
album_artist=Various Artists
encoder=Lavf58.12.100

Is this expected that th artist does not appear?

chme commented 5 years ago

That is how it is currently implemented in forked-daapd. forked-daapd only lists album artists in its artist listings. During the library scan only the album artist is added to the groups db table. The artists only exists in the files db table.

If you want to look into adding support to also show the (track) artist, there are db triggers that insert/update the groups table. You would need to add a new trigger to also insert/update based on the artist. In addition, a new db field for the persistent id for the track artist would be necessary in the files table and at last the queries for listing the artists need to be changed to also take the artist into account (probably a second join from the groups to the files table using the new artist persistent id).

Might be worth adding a config option to allow users to choose whether they want only the album artist or album artist plus artist in the listings ...

ejurgensen commented 5 years ago

Yes, like @chme says this is by design. In my experience the artist view becomes very messy if not based on album_artist, but of course it would depend on the individual music library. Some libraries also have artist names that contain multiple artists (comma-separated), which is fine to display while playing, but requires special handling to work in the artist view.

whatdoineed2do commented 5 years ago

To restate the problem; currently if we have an artist (db artist) track that exists in a compilation album (db album_artist == Various Artists for example) then this track is NOT shown for the artist (listings are based on album_artist's hash songartistid) or searchable (searches are on db album_artist). Furthermore, if the artist in the Various Artist/compilation does NOT have an album of their own, this artist is NOT shown from the db_build_query_group_artists() or searchable.

This can appear that the server has lost tracks if you forget that an artist and track is only in a compilation (as I did)

I took a look at this but I think I'm struggling to get this right unless I introduce bigger changes. Adding a new hash/permanentid (songtrackartist based on artist) doesn't really help. I also experimented with the db triggers adding fake entries for artists on each upd/insert (where artist != album_artist) so the db_build_query_group_artists() could find album_artist but this seems too much of a hack.

Are there any suggestions or alternative paths you'd suggest on how this problem can be solved?

ejurgensen commented 5 years ago

Maybe I misunderstand, but if you disable the compilation_artist setting, don't you then get what you want? Then album_artist will be set to artist, and you will see them all. Well unless your files are tagged with "Various Artists" in the idv3 album_artist field, but then maybe you should remove that tag, if you don't want them listed as such?

If I misunderstand then please give a concrete example of a file and the tags it has.

I think you are right that search should also include artist, not just album_artist. Wasn't aware it didn't do that.

whatdoineed2do commented 5 years ago

Using forked daapd's compilations and compilation_artist does not seem to help the problem I describe whereby I want all artists listed.

example-lib.tar.gz

Here (and attached as a zip) is a simple library to hopefully illustrate - all files contain dummy 5sec audio and corresponding metadata (artist/title/album only); it was 4x tracks, 2x artists. 1x artist exists with its own album AS WELL as in a compilation and 1x artist ONLY exists in a compilation.

sqlite> select f.id,album_artist,artist,title,album,songalbumid,songartistid,compilation from files f;
id  album_artist          artist  title   album                 songalbumid           songartistid          compilation         
--  --------------------  ------  ------  --------------------  --------------------  --------------------  --------------------
54  foo                   foo     efgh    Greatest Hits         411092463240599912    2284445307090104385   0                   
55  foo                   foo     abcd    Greatest Hits         411092463240599912    2284445307090104385   0                   
56  Various artists       bar     MNOP    compilation           8913509730199511624   8395563705718003786   1                   
57  Various artists       foo     xyz     compilation           8913509730199511624   8395563705718003786   1

This is what comes back from the server (json is easier to show):

$ curl http://localhost:3689/api/library/artists  | jq
{
  "items": [
    {
      "id": "2284445307090104385",
      "name": "foo",
      "name_sort": "foo",
      "album_count": 1,
      "track_count": 2,
      "length_ms": 4080,
      "uri": "library:artist:2284445307090104385",
      "artwork_url": "/artwork/group/24"
    },
    {
      "id": "8395563705718003786",
      "name": "Various artists",
      "name_sort": "Various artists",
      "album_count": 1,
      "track_count": 2,
      "length_ms": 4080,
      "uri": "library:artist:8395563705718003786",
      "artwork_url": "/artwork/group/51"
    }
  ],
  "total": 2,
  "offset": 0,
  "limit": -1
}

via the db:

SELECT g.id, g.persistentid, f.album_artist, f.album_artist_sort, COUNT(f.id) as track_count, COUNT(DISTINCT f.songalbumid) as album_count, f.album_artist, f.songartistid, SUM(f.song_length) FROM files f JOIN groups g ON f.songartistid = g.persistentid WHERE f.disabled = 0 GROUP BY f.songartistid  ORDER BY f.album_artist_sort, f.album_sort, f.disc, f.track ;
id          persistentid          album_  album_  track_count           album_count           album_artist          songartistid          SUM(f.so
----------  --------------------  ------  ------  --------------------  --------------------  --------------------  --------------------  --------
24          2284445307090104385   foo     foo     2                     1                     foo                   2284445307090104385   4080    
51          8395563705718003786   Variou  Variou  2                     1                     Various artists       8395563705718003786   4080

What I'm trying to achieve:

the artists that are in ONLY compilations to have their own entry (see bar entry that differs from what actually comes from the server in the above)
any artists that exist in compilations AND their own albums should have their entries reflect track count etc in the artist entry

This would be the returned

M O C K U P

{
  "items": [
    {
      "id": "1125938474926605440",
      "name": "bar",
      "name_sort": "bar",
      "album_count": 1,
      "track_count": 1,
      "length_ms": ..,
      "uri": "library:artist:1125938474926605440",
      "artwork_url": "/artwork/group/....."
    },    
    {
      "id": "2284445307090104385",
      "name": "foo",
      "name_sort": "foo",
      "album_count": 2,
      "track_count": 3,
      "length_ms": ...,
      "uri": "library:artist:2284445307090104385",
      "artwork_url": "/artwork/group/24"
    },
    {
      "id": "8395563705718003786",
      "name": "Various artists",
      "name_sort": "Various artists",
      "album_count": 1,
      "track_count": 2,
      "length_ms": 4080,
      "uri": "library:artist:8395563705718003786",
      "artwork_url": "/artwork/group/51"
    }
  ],
  "total": 2,
  "offset": 0,
  "limit": -1
}

Taking this further I'd lke to have the artist's albums page list the compilations that they have tracks so we can see ALL albums that contain their tracks.. perhaps this is a direction you'd not be entirely comfortable with ie (the following is a mockup) - the 2nd entry shows that this artist foo's albums include albums where album_artist == "foo" AND albums where artist = "foo"

M O C K U P

$ curl http://localhost:3689/api/library/artists/2284445307090104385/albums  | jq 
{
  "items": [
    {
      "id": "411092463240599912",
      "name": "Greatest Hits",
      "name_sort": "Greatest Hits",
      "artist": "foo",
      "artist_id": "2284445307090104385",
      "track_count": 2,
      "length_ms": 4080,
      "uri": "library:album:411092463240599912",
      "artwork_url": "/artwork/group/53"
    },
    {
      "id": "8913509730199511624",
      "name": "compilation",
      "name_sort": "compilation",
      "artist": "Various artists",
      "artist_id": "8395563705718003786",
      "track_count": 1,
      "length_ms": 4080,
      "uri": "library:album:8913509730199511624",
      "artwork_url": "/artwork/group/54"
    }
  ],
  "total": 2,
  "offset": 0,

Does that help clarify? I'm happy to dig further but I think I'm hitting a mental block -

ejurgensen commented 5 years ago

I think I understand what you want to achieve, and I still think you can pretty much achieve it by disabling the compilation_artist setting. After doing that you need to do a rescan, and it may need to be a "full rescan", so that means you need to place a file ending in ".full-rescan" in your library. Maybe the latter is the reason it didn't work for you the first time around? It isn't really documented, unfortunately.

In the example library you attached the compilation tracks have no album_artist. If compilation_artist is not enabled, then forked-daapd will instead use artist as album_artist, which means you should get the result you want (except you won't have any entry for "Various artists").

whatdoineed2do commented 5 years ago

Cool, this gets me closer but I think there's still a small gap.

With compilation_artist commented out in cfg and no album artist in the files' metadata, the artists returned looks better (there's no Various Artist but I think that;s ok).

{
  "items": [
    {
      "id": "1125938474926605440",
      "name": "bar",
      "name_sort": "bar",
      "album_count": 1,
      "track_count": 1,
      "length_ms": 2040,
      "uri": "library:artist:1125938474926605440",
      "artwork_url": "/artwork/group/4"
    },
    {
      "id": "2284445307090104385",
      "name": "foo",
      "name_sort": "foo",
      "album_count": 2,
      "track_count": 3,
      "length_ms": 6120,
      "uri": "library:artist:2284445307090104385",
      "artwork_url": "/artwork/group/2"
    }
  ],
  "total": 2,
  "offset": 0,
  "limit": -1
}

and digging down on each artist from the webui does what I expect (the track in the compilation only comes back that artist's track and not others).

What we get on the albums side is the next progression and I don't know if this is fixable in cfg.

With the sample library and cfg, we get this:

$ curl http://localhost:3689/api/library/albums  | 
{
  "items": [
    {
      "id": "4785949349717473930",
      "name": "compilation",
      "name_sort": "compilation",
      "artist": "foo",
      "artist_id": "2284445307090104385",
      "track_count": 1,
      "length_ms": 2040,
      "uri": "library:album:4785949349717473930",
      "artwork_url": "/artwork/group/5"
    },
    {
      "id": "8744875493222204375",
      "name": "compilation",
      "name_sort": "compilation",
      "artist": "bar",
      "artist_id": "1125938474926605440",
      "track_count": 1,
      "length_ms": 2040,
      "uri": "library:album:8744875493222204375",
      "artwork_url": "/artwork/group/3"
    },
    {
      "id": "411092463240599912",
      "name": "Greatest Hits",
      "name_sort": "Greatest Hits",
      "artist": "foo",
      "artist_id": "2284445307090104385",
      "track_count": 2,
      "length_ms": 4080,
      "uri": "library:album:411092463240599912",
      "artwork_url": "/artwork/group/1"
    }
  ],
  "total": 3,
  "offset": 0,
  "limit": -1
}

We see the Compilation album (called compilation in example above) with 2 entries .. but I would expect this to be a single entry. Perhaps a better example of this would be a film soundtrack with a number of artists involved: on the artist side its good to see all the individual artists listed.

On the albums side though, the soundtrack would have the same album repeated multiple times (one for each artist in the album) and this is confusing - this is actually the reason for me looking at this in the first place.

Is there any suggestion to achieve this consolidated compilations as a single entity with a commented out compilation_artist and unset album artist in the music file's metedata? If not, would you'd be ok to accept PR to allow this functionality (although I guess this might take me back to my same mental block).

M O C K   U P
{
  "items": [
    {
      "id": "..."
      "name": "compilation",
      "name_sort": "compilation",
      "artist": "Various Artists",
      "artist_id": "..."
      "track_count": 2,
      "length_ms": ...,
      "uri": "library:album:...",
      "artwork_url": "/artwork/group/5"
    },
    {
      "id": "411092463240599912",
      "name": "Greatest Hits", 
      "name_sort": "Greatest Hits",
      "artist": "foo",
      "artist_id": "2284445307090104385",
      "track_count": 2,
      "length_ms": 4080,
      "uri": "library:album:411092463240599912",
      "artwork_url": "/artwork/group/1"
    }
  ],
  "total": 2,
  "offset": 0, 
  "limit": -1
}

In the db (for the example library), the 2x tracks belonging to the same compilatino album (songalbumid) now have different persistent ids because it'll be issuing persistent id/hash of "bar+compilation" and "foo+compilation" respectively to generate the ids for rows 3 and 4 whereas previously it would have used hash of "Various Artists+compilation" for both rows.

sqlite> select f.id,album_artist,artist,title,album,songalbumid,songartistid,compilation from files f;
id          album_artist  artist      title       album          songalbumid         songartistid         compilation
----------  ------------  ----------  ----------  -------------  ------------------  -------------------  -----------
1           foo           foo         efgh        Greatest Hits  411092463240599912  2284445307090104385  0          
2           foo           foo         abcd        Greatest Hits  411092463240599912  2284445307090104385  0          
3           bar           bar         MNOP        compilation    874487549322220437  1125938474926605440  1          
4           foo           foo         xyz         compilation    478594934971747393  2284445307090104385  1

ejurgensen commented 5 years ago

Identifying if two tracks belong to the same album is not a trivial problem. Building on your example, foo and bar could have tracks in two distinct albums that were just both called "Greatest Hits", or their tracks could belong to one and the same album. How would forked-daapd know the difference?

One option is to use the directory as a factor, but on the other hand forked-daapd shouldn't be a file manager. The principle of iTunes, forked-daapd etc. is that they should present organized views of the music in a library, even if the library is "disorganised" on the filesystem. So that doesn't seem great either.

Another option is to use some online service like MusicBrainz to get this info, and that is perhaps the best way to go. However, that requires some work at the keyboard.

whatdoineed2do commented 5 years ago

Another option is to use some online service like MusicBrainz to get this info, and that is perhaps the best way to go. However, that requires some work at the keyboard

I don't think it's right for forked daapd to be pulling MusicBrainz info for files but I do think that using some "user tagged album identifier" (doesnt have to be MusicBrainz's albumid - it could be some value a user randomly generates into this tag) would be the solution to the problem of knowning how to categorise albums in the use case I mention above.

Implementing this would seem to include:

populate struct metadata_map md_map_generic in filescanner_ffmpeg to look for a finite list of tags that would give us the "user tagged album identifier" as compilation_id
adding a new persistent id, compilation_id to db, default 0, generated by the songartistalbum_daap(LOWER(compilation_id), '')
update to db_build_query_group_albums()'s select/group to use compilation id as the first
```
SELECT 
 g.id, g.persistentid,
 f.album, f.album_sort,
 COUNT(f.id) as track_count, 1 as album_count,
 f.album_artist, f.songartistid,
 SUM(f.song_length)
FROM files f
JOIN groups g ON f.songalbumid = g.persistentid 
  %s
GROUP BY f.compilationid, f.songalbumid
%s %s %s;", qc->where, qc->having, qc->order, qc->index
```
This would take care of multiple albums with same names, such as "Greatest Hits", as we would be listing them as different entries. The next part would be looking up an album. Right now we're using songalbumid (generated by hashing album_artist and album) so perhaps we'd look to generate songalbumid based on non-null compilation_id or using the current mechansim.

Is an approach you'd be ok with? If so I can put this together.

ejurgensen commented 5 years ago

So you mean the user should set idv3 tags with these compilation id's (or album id's)? Which tag exactly are you thinking of?

It sounds a bit laborious for the user... why do you prefer that over getting the id's from MusicBrainz?

whatdoineed2do commented 5 years ago

user should set idv3 tags with these compilation id's ... why do you prefer that over getting the id's from MusicBrainz?

Not quite. From the server side, it would just care that there is a common identifier for all tracks in a given album. The server just has to read that field from metadata and use that (hash) as the compilation_id.

The server can then use compilation_id (if there) to identify an album instead of identifying album based on album name + artist name which currently fails for the case where:

album is compilation
album has multiple artists
each track of that album can not be grouped together because songalbumid is not consistent for all tracks in the album (because currently it uses album name + artist name)

What I was considering is for the compilation_id that the server uses to be pulled from metadata tags (this is what ffprobe reports are the added musicbrainz tags so filescanner_ffmpeg would be able to use them). I think we'd look for any of thse (in order of priority)

MusicBrainz Album Id
MusicBrainz Release Group Id
CATALOGNUMBER
BARCODE

For ppl who dont want to use musicbrainz to touch their library but wanted this ablum grouping functionality, they could just put any unique value into any of these tags mentioned above for their album and the server would be able to work with it.

whatdoineed2do commented 5 years ago

This is working in https://github.com/whatdoineed2do/forked-daapd/tree/db-compilation-id and it requires a compilation id that we get from the metadata; the insert/upd triggers use, if available, the compilation id to create the songalbumid otherwise the existing mechanism (artist + album name) is used.

sqlite> select f.id,album_artist,artist,title,album,songalbumid,songartistid,compilation,compilationid from files f;
id          album_artist  artist      title       album          songalbumid         songartistid         compilation  compilationid
----------  ------------  ----------  ----------  -------------  ------------------  -------------------  -----------  -------------
1           foo           foo         efgh        Greatest Hits  411092463240599912  2284445307090104385  0                         
2           foo           foo         abcd        Greatest Hits  411092463240599912  2284445307090104385  0                         
3           bar           bar         MNOP        compilation    874487549322220437  1125938474926605440  1                         
4           foo           foo         xyz         compilation    478594934971747393  2284445307090104385  1                         
5           Various Arti  Marvin Gay  Let's Get   The Motown Co  169463011763780894  8395563705718003786  1            389cb65e-87bc
6           bar           bar         MNOP        Motown Greate  434185646584293662  1125938474926605440  1            xx-xx-xx-xx  
7           foo           foo         xyz         Motown Greate  434185646584293662  2284445307090104385  1            xx-xx-xx-xx

This give us what we expect in the albums where "Motown Greatest Hits" (with 2 different artists) as a single entry.

$  curl http://localhost:3689/api/library/albums  | jq 
{
  "items": [
    {
      "id": "4785949349717473930",
      "name": "compilation",
      "name_sort": "compilation",
      "artist": "foo",
      "artist_id": "2284445307090104385",
      "track_count": 1,
      "length_ms": 2040,
      "uri": "library:album:4785949349717473930",
      "artwork_url": "/artwork/group/5"
    },
    {
      "id": "8744875493222204375",
      "name": "compilation",
      "name_sort": "compilation",
      "artist": "bar",
      "artist_id": "1125938474926605440",
      "track_count": 1,
      "length_ms": 2040,
      "uri": "library:album:8744875493222204375",
      "artwork_url": "/artwork/group/3"
    },
    {
      "id": "411092463240599912",
      "name": "Greatest Hits",
      "name_sort": "Greatest Hits",
      "artist": "foo",
      "artist_id": "2284445307090104385",
      "track_count": 2,
      "length_ms": 4080,
      "uri": "library:album:411092463240599912",
      "artwork_url": "/artwork/group/1"
    },
    {
      "id": "1694630117637808949",
      "name": "The Motown Collection (10 CD/1 DVD) Box set, Enhanced",
      "name_sort": "Motown Collection (00010 CD/00001 DVD) Box set, Enhanced",
      "artist": "Various Artists",
      "artist_id": "8395563705718003786",
      "track_count": 1,
      "length_ms": 293982,
      "uri": "library:album:1694630117637808949",
      "artwork_url": "/artwork/group/6"
    },
    {
      "id": "4341856465842936626",
      "name": "Motown Greatest Hits",
      "name_sort": "Motown Greatest Hits",
      "artist": "Various Artists",
      "artist_id": "2284445307090104385",
      "track_count": 2,
      "length_ms": 4080,
      "uri": "library:album:4341856465842936626",
      "artwork_url": "/artwork/group/8"
    }
  ],
  "total": 5,
  "offset": 0,
  "limit": -1
}

I'll give more of a test with a larger dataset and then submit a PR but it seems to meet the need.

FWIW, here's a sample of what ffmpeg found on a genuine file processed by MusicBrainz's picard tagger:

$ ffprobe -hide_banner -i /tmp/m/marvin-gaye.mp3 
Input #0, mp3, from '/tmp/m/marvin-gaye.mp3':
  Metadata:
    title           : Let's Get It On
    artist          : Marvin Gaye
    track           : 2/15
    album           : The Motown Collection (10 CD/1 DVD) Box set, Enhanced
    disc            : 7/11
    genre           : Pop
    compilation     : 1
    TMED            : CD
    TDOR            : 2008
    date            : 2008
    artist-sort     : Gaye, Marvin
    TSRC            : GBCBR0300006
    SCRIPT          : Latn
    album_artist    : Various Artists
    TSO2            : Various Artists
    TSST            : Volume 4, Disc 1
    ASIN            : B001BK4IUQ
    originalyear    : 2008
    publisher       : Time-Life Music/UMe
    Artists         : Marvin Gaye
    MusicBrainz Album Status: official
    MusicBrainz Album Type: album/compilation
    MusicBrainz Album Id: 389cb65e-87bc-4897-9c56-26a482c4e1d3
    MusicBrainz Artist Id: afdb7919-059d-43c1-b668-ba1d265e7e42
    MusicBrainz Album Artist Id: 89ad4ac3-39f7-470e-963a-56509c546377
    MusicBrainz Release Group Id: 6b2d3944-4bfa-46f2-ae44-18ffce2b32b2
    MusicBrainz Release Track Id: 935f9071-0b57-42a7-b3a2-50fa49e84039
  Duration: 00:04:53.98, start: 0.025057, bitrate: 252 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 251 kb/s
    Metadata:
      encoder         : LAME3.98r
    Side data:
      replaygain: track gain - -2.400000, track peak - unknown, album gain - unknown, album peak - unknown,

Interesting, it doesn't tag BARCODE or CATALOGUE even though it's mentioned in their docs https://picard.musicbrainz.org/docs/tags

ejurgensen commented 5 years ago

Using those tags sounds like a good approach, and I agree that forked-daapd can leave the actual MusicBrainz integration to other tagging programs.

I will take a closer look at the implementation when you submit the pr.

whatdoineed2do commented 5 years ago

I've put a PR (https://github.com/ejurgensen/forked-daapd/pull/670) in for this so hopefully @ejurgensen will have time to look at this.

Addition to this, I also submitted this PR (https://github.com/ejurgensen/forked-daapd/pull/672) to fix a track ordering bug in album view that I saw whilst working through my testing. This should be very much uncontroversial.

whatdoineed2do commented 5 years ago

There is also a sister PR (https://github.com/ejurgensen/forked-daapd/pull/671) that allows the db to distinguish track artist from album artist for listing and searching.

It would be great if @ejurgensen / @chme can have a look at both https://github.com/ejurgensen/forked-daapd/pull/671 and https://github.com/ejurgensen/forked-daapd/pull/670 and let me know your thoughts (these can go in independent of each other). These have been on my main system for a few weeks and no issues.

ejurgensen commented 5 years ago

It will take a while yet before I have time to review #671, I am working on some stuff that I would like to finish first