navidrome / navidrome

🎧☁️ Modern Music Server and Streamer compatible with Subsonic/Airsonic
https://www.navidrome.org
GNU General Public License v3.0
10.36k stars 797 forks source link

Support using MusicBrainz IDs instead of other metadata #1363

Open vs49688 opened 2 years ago

vs49688 commented 2 years ago

Converts the library to use MusicBrainz IDs for albums, artists, albumartists, and tracks. All playlists, stars, etc. are kept.

This is an irreversible transformation; the user is given a warning and may cancel within 10 seconds by pressing ^C.

The mediafiles table is used as the sole source-of-truth to build an entirely new hierarchy, based off the MusicBrainz IDs.

Existing entity metadata is applied to the new entities, except in the case where one entity may need to be split (e.g. different releases of the same album). These split entities are left blank.

References are then updated for both play queues and playlists.

The existing entities are then deleted, and a full-rescan is triggered to fill in any missing information.

Closes https://github.com/navidrome/navidrome/issues/489.

certuna commented 2 years ago

I think it's a good idea in principle, just some remarks:

vs49688 commented 2 years ago

I think it's a good idea in principle, just some remarks:

* this will split albums where some tracks have different or no `mbz_album_id`, which may catch more than a few users by surprise, in particular those who group "singletons" together into one pseudo-compilation album (e.g. "Loose Tracks"). The alternative is using Grouping or Release Type for grouping these together (and excluding them from the album list), but unfortunately those features aren't implemented yet so you'd leave those users with no effective way to manage singletons

In this case, I'm of the mind to tell the user to strip the MusicBrainz tags from those files, retaining the current behaviour. By moving them into a "Loose Tracks" album, you're effectively creating a "new" album. If this "new" album isn't on MusicBrainz, it shouldn't have MusicBrainz tags. Also, isn't this what playlists are for? Of course, this might be a unpopular opinion, but I think it's the more "correct" one.

Another option would be to add a "navidrome_mbignore" tag, which if set would tell Navidrome to ignore any MusicBrainz tags it finds when scanning.

* a good thing is this will allow splitting artists with the same name

Indeed, I never considered this may need to be done with artists until after I'd pushed the commit.

vs49688 commented 2 years ago

Pushed new version that should handle track/album/artist/albumartist IDs.

certuna commented 2 years ago

One issue will be: what does Navidrome do with artists that have the same mbid but different tags in the Artist field, such as "DJ Tiesto" and "Tiesto"?

It's not so easy, you might get some strange behaviour, consider this case:

Current behaviour: album 1 and 3 grouped under DJ Tiesto, album 2 and 4 grouped under Tiesto What will the new behaviour be?

vs49688 commented 2 years ago

Regardless of my changes, albums 3 & 4 should be under different artists (or the artists fixed). As for 1 & 2, perhaps I should change it to only use the mbid if it exists?

vs49688 commented 2 years ago

Actually, thinking about it some more... What do we want the behaviour to be? Do we want Navidrome to merge artists/albums/whatever based on mbid? That'd be a pretty drastic change in behaviour, but not an unreasonable one.

All I intended was to make it not merge different releases of the same album. If the tags in the files are wrong (thus causing Navidrome to split the artists/albums), then they should be fixed.

I'm fine with either option (it doesn't affect me as I run everything through Picard and beets), @deluan what do you think?

certuna commented 2 years ago

The more I think of it, it's indeed a drastic change.

Navidrome currently does all organization based on official tag fields: Artist, Album Artist, Album Title. It's not perfect (no disambiguation possible between artists or albums with the same name), but at least it's pretty transparent why stuff happens.

Moving to organization based on custom Musicbrainz ID tags is good if you consistently use those, but that puts the onus on users to have not only an immaculately tagged collection with 'regular' tags, but also consistently have the mbz tags correct, because you'll create splits between stuff with and without mbz id's. And this decreases transparency to the user why this happens, and we'll end up like Plex where the scanner feels like a black box where "anything can happen".

For example:

vs49688 commented 2 years ago

The more I think of it, it's indeed a drastic change.

Navidrome currently does all organization based on official tag fields: Artist, Album Artist, Album Title. It's not perfect (no disambiguation possible between artists or albums with the same name), but at least it's pretty transparent why stuff happens.

Moving to organization based on custom Musicbrainz ID tags is good if you consistently use those, but that puts the onus on users to have not only an immaculately tagged collection with 'regular' tags, but also consistently have the mbz tags correct, because you'll create splits between stuff with and without mbz id's. And this decreases transparency to the user why this happens, and we'll end up like Plex where the scanner feels like a black box where "anything can happen".

As long as the scanning behaviour is explicitly-and-well-documented, I don't really see the problem.

For example:

* currently a user can move an album to another artist just by changing the `Artist` tag, that's very logical and simple. But the album still retains the old `artist_mbid` tag, what will Navidrome do?

* currently a user can move a track to another album just by changing the `Album Title` tag. Logical, simple, and that's what pretty much every music player (foobar, Apple Music, MusicBee, etc) allows you to do. Grouping by `release_mbid` breaks that.

How about this?

Add a ScanUseMbzID option in the configuration file, unset by default. When unset, the behaviour is unchanged, keeping it as it is now.

When set, if an album/artist has a mbid, Navidrome will use it (and only it). This provides a solution to #489, assuming everything's tagged correctly. If the mbid is NOT set, it'll fall back and create a new/duplicate artist/album. It is up to the user to tag the files correctly and trigger a rescan.

Forgive me for being pushy on this, but I need a fix for the #489 - it's affecting large part of my library.

deluan commented 2 years ago

Hey, sorry being late for the party :)

After reading the thread, these are my two cents (mostly agreeing with latest @vs49688 comment): 1) The new behaviour (using mbids) has to be configurable, and turned off by default 2) When the feature is enabled, if no mbid is found, fallback to the current behaviour. 3) I think it is useful only for albums, albumartists and artists, but not for tracks.

On top of that, a migration plan is a MUST, or else users that enable this will break their playlists, stars, playcounts, etc... The migration should go over all tables with an album_id or artist_id and recalculates a new id. Foreign keys must be observed. To make things even trickier, this will have to happen only when the config option is first enabled, and cannot be disabled afterwards (or it would mess with the DB), making the implementation tricky (similar to what I had to do for PasswordEncryptionKey)

vs49688 commented 2 years ago

How'd you prefer the migrations to be done?

  1. Duplicate entries, except with the mbid as the id, fix up references, delete old, vacuum
  2. Disable foreign keys, update in-place
deluan commented 2 years ago

How'd you prefer the migrations to be done?

  1. Duplicate entries, except with the mbid as the id, fix up references, delete old, vacuum
  2. Disable foreign keys, update in-place

Whichever is faster. I'd assume it is the later, but it is worth to test it out if you can.

vs49688 commented 2 years ago

It's mostly finished, all that needs to be done now is some cleanups, and to trigger a full rescan.

Also, where's the best place for this? I've just put it in the top-level cmd package to start with, but it should probably go elsewhere. I'd like to put it in db/migrations (specifically so I can use forceFullRescan), but can't because of circular imports.

vs49688 commented 2 years ago

Ping on this?

Specific issues I'd like feedback on:

The current behaviour:

deluan commented 2 years ago

Sorry for the delay. Will take a look tonight

deluan commented 2 years ago
  • It's in the root cmd package for now, where should it actually go?

What do you think of creating a new command (i.e. navidrome use_mbid), that executes the migration and sets the UsingMbzIDs in the DB. That way, the user would have to be more intentional about this (we could even show a message like "stop now and do a backup if you haven't done it yet) and we wouldn't need the UseMbzIDs config option. The user would see the feedback of the process on his screen as he would need to run it from the command line.

  • Do the DeleteMany functions I've added to the {artist,album}Repository interfaces have security implications I'm not aware of - i.e. does it automatically create a REST route for it (or something, I haven't really looked at the code for this).

No worries, it is not created automatically :)

  • A full rescan is required to finish the process - this must be triggered manually.

Is it really required?

I'll do a more fine grained review of the PR later, but wanted to give you feedback on the approach ASAP

vs49688 commented 2 years ago
  • It's in the root cmd package for now, where should it actually go?

What do you think of creating a new command (i.e. navidrome use_mbid), that executes the migration and sets the UsingMbzIDs in the DB. That way, the user would have to be more intentional about this (we could even show a message like "stop now and do a backup if you haven't done it yet) and we wouldn't need the UseMbzIDs config option. The user would see the feedback of the process on his screen as he would need to run it from the command line.

That'd work, I'm okay with it.

  • Do the DeleteMany functions I've added to the {artist,album}Repository interfaces have security implications I'm not aware of - i.e. does it automatically create a REST route for it (or something, I haven't really looked at the code for this).

No worries, it is not created automatically :)

Nice!

  • A full rescan is required to finish the process - this must be triggered manually.

Is it really required?

Yes, it gets messy otherwise. Think of the case where a track's album mbid is different to that of the album's. If there's no corresponding album in Navidrome's database, I'd have to create it, but which metadata do I use? Do I duplicate the existing one and simply change the ID, or do I try to infer it from the track's db entry? Do I invoke the scanner on the specific file?

All this PR effectively does is change the album/artist/albumartist IDs to their corresponding mbid's. No albums/tracks/artists are added or deleted; the disk isn't touched - it's purely a database-only operation.

From here, a rescan is the final pass to fix everything up, creating any extra albums, unmerging them, etc.

deluan commented 2 years ago

If a rescan is needed to do things like creating any extra album, are you sure annotations are kept after the migration?

vs49688 commented 2 years ago

Yep, see the MoveAnnotation() calls in migrateAlbums() and migrateArtists().

EDIT: At least, they were kept for me. There might be some edge case I'm missing...

vs49688 commented 2 years ago

Apologies for the delay! I've changed it to be a new command as discussed, and cleaned things up a bit.

Give it a go, and let me know how it goes.

deluan commented 2 years ago

Thanks, I've been focused on implementing the Smart Playlist feature (#1417), but will try to spend some time testing this (and other pending PRs) this week

vs49688 commented 2 years ago

Ping on this?

deluan commented 2 years ago

Sorry for the delay, I'll take a look and validate the PR this week/weekend. And no worries, I'll fix the conflicts.

vs49688 commented 2 years ago

Ping, is this PR still wanted?

deluan commented 2 years ago

Yes it is! Sorry, but I was not able to test it throughly for the release. Actually I had a couple of PRs (including this one) that were holding the release as I couldn't find time to properly test them. Now that the release is out there, I can merge it this weekend and have it up and running in my private instance for a whole month before next release.

vs49688 commented 2 years ago

I can rebase if you'd like. Keeping the commit history linear is always nice.

deluan commented 2 years ago

I can rebase if you'd like. Keeping the commit history linear is always nice.

Agree, that would be nice. The only conflict is that we don't need the Update, Save and Delete methods anymore in the album and artist repositories. So you only need to leave the DeleteMany method you created.

vs49688 commented 2 years ago

Done

vs49688 commented 2 years ago

@deluan I had a last-minute thought on this - it can't handle a beet mbsync.

beet mbsync may move files around if their metadata has changed. Because Navidrome uses the file path for the track IDs (even with this PR), this will break.

Thoughts on changing this to use the mbz track ID (which should never change) instead?

func (s mediaFileMapper) trackID(md metadata.Tags) string {
    return fmt.Sprintf("%x", md5.Sum([]byte(md.FilePath())))
}
deluan commented 2 years ago

Sorry for the looong delay, but yeah I think if we are going to enable MBIDs to artist/album, we might as well enable them for tracks.

vs49688 commented 1 year ago

Switching this to a draft, found a few issues.

vs49688 commented 1 year ago

Am now running this on my two instances and everything seems fine. Will leave for a bit longer and report back.

Not sure about the golanglint-ci issue, I can't reproduce it on my machine.

EDIT: of course the moment I say this, I find another issue

arg274 commented 1 year ago

Built bb63c4c36f9deaec7075ff8616f2c15309819de4 and there seems to be a bug of tracks having the same title under the same release being split off. One but all other copies of the tracks get split off to a different release and they are put under an artist entity in the DB with the legacy ID used prior to the migration. The other release that has the single copy of the track (along with the other tracks in the release) is put under an artist entity with its MBID.

vs49688 commented 1 year ago

I think I also hit that with https://musicbrainz.org/release/8e9a1ac2-22f5-4d9d-b794-fb81428d44a1, but I couldn't reproduce it with a test case. I'll give it another go.

EDIT: Reproduced it. I'm using musicbrainz_trackid when I should be using musicbrains_releasetrackid.

vs49688 commented 1 year ago

@deluan am going to need your input on this.

With Tags.MbzTrackID(), did you intend for it to return the global recording ID, or the release track ID?

func (t Tags) MbzTrackID() string { return t.getMbzID("musicbrainz_trackid", "musicbrainz track id") }

I assumed the latter, but according to here, when in tags musicbrainz_trackid is the recording ID, and musicbrainz_releasetrackid is the track ID. Confusingly, musicbrainz_trackid IS the track ID when working with Picard, and it remaps it internally.

I can change it to use musicbrainz_releasetrackid, and this shouldn't affect existing libraries, as they're not using tags for organisation. Mine, however, will need some manual migrations, as I've been running with this for a while now :angry:

vs49688 commented 1 year ago

@arg274 Could you please try running https://github.com/navidrome/navidrome/commit/cc9da224e390e0a47dea5081fb821f0e427fe7eb?

arg274 commented 1 year ago

@vs49688 Completely forgot to check this thread but I tested bdd53e7e8927a0b18607032014644c13981fbb69 and it seems to have fixed the issue.

vs49688 commented 1 year ago

Ping on this? There's actually very little code change if you exclude the last commit. Most of the size is in the migration.

github-actions[bot] commented 1 year ago

Download the artifacts for this pull request:

phw commented 11 months ago

With Tags.MbzTrackID(), did you intend for it to return the global recording ID, or the release track ID?

@vs49688 Your assumption of MbzTrackId being the recording ID is correct.

To avoid this confusion it got renamed to MbzRecordingId in #2279 .

The source of the confusion is that the tag name in files is musicbrainz_track_id, which has historic reasons.

vs49688 commented 11 months ago

Yep, caught on to that pretty early - It caused issues with this PR. The correct tag is being captured since https://github.com/navidrome/navidrome/pull/1827, and this PR has been subsequently updated.

github-actions[bot] commented 6 months ago

Download the artifacts for this pull request:

github-actions[bot] commented 6 months ago

Download the artifacts for this pull request:

arg274 commented 4 months ago

Still using the builds from this PR. If I'm not mistaken, if there's an artist with the same ID but multiple aliases, the name of the merged entity is set to whichever alias is scanned first (?) I think a minor improvement would be to take the most recent one (based on the most recent release) or the mode of all the aliases.

vs49688 commented 4 months ago

Still using the builds from this PR.

Good to know, I thought I was the only one.

If I'm not mistaken, if there's an artist with the same ID but multiple aliases, the name of the merged entity is set to whichever alias is scanned first (?) I think a minor improvement would be to take the most recent one (based on the most recent release) or the mode of all the aliases.

The approach this PR takes has been rejected (apparently not everyone is as Musicbrainz-only as I am :man_shrugging:), and as such is in maintenance mode.

I'll keep updating it as necessary, but I'm only keeping it around until the equivalent functionality is implemented in Navidrome proper and hits a release.

Then I'll probably write a migration tool, and this PR can finally be laid to rest.

github-actions[bot] commented 4 months ago

Download the artifacts for this pull request:

github-actions[bot] commented 4 months ago

Download the artifacts for this pull request:

github-actions[bot] commented 3 months ago

Download the artifacts for this pull request:

danni-storm commented 2 months ago

I'll keep updating it as necessary, but I'm only keeping it around until the equivalent functionality is implemented in Navidrome proper and hits a release.

Is there a chance of that happening sometime soon? Is the team planning on merging any of this code or implementing their own version? I see a lot of commits on your part to keep the PR in sync with the main releases but don't see any activity from the Navidrome team here for the last year or two. I know its foss and they have their hands full - more curious than anything as I think this would be an awesome feature since I have many albums with multiple versions.

github-actions[bot] commented 2 months ago

Download the artifacts for this pull request:

deluan commented 1 month ago

implementing their own version?

Yes, this has been discused in https://github.com/navidrome/navidrome/issues/1976#issuecomment-1383278483 and will be implemented in #2709

For now I want to thank @vs49688 for keeping this up-to-date, as it obviously has value for a lot of other users.