music-assistant / hass-music-assistant

Turn your Home Assistant instance into a jukebox, hassle free streaming of your favorite media to Home Assistant media players.
Apache License 2.0
1.21k stars 44 forks source link

Some albums are missing tracks due to being wrongly detected as duplicates #2425

Closed madbrain76 closed 3 weeks ago

madbrain76 commented 3 weeks ago

What version of Music Assistant has the issue?

2.0.4

What version of the Home Assistant Integration have you got installed?

2024.5.1

Have you tried everything in the Troubleshooting FAQ and reviewed the Open and Closed Issues and Discussions to resolve this yourself?

The problem

An 18-track album is missing tracks 8 and 16

How to reproduce

  1. setup the file system (remote share) player
  2. let it import the whole library, about 70k files . I cannot import just that specific folder, since it has a space in it and there is a bug in the music provider (I will file a separate bug for that issue)
  3. open the album in the "albums" tab . It is Robert Edward Smith Harpsichord in the Grand Manner
  4. select a player
  5. click play / play now
  6. notice there are tracks numbered 1 through 18, but tracks 8 and 16 are missing

Music Providers

File system (remote share)

Player Providers

This is independent of player

Full log output

log.txt

Additional information

image

What version of Home Assistant Core are your running

2024.6.1

What type of installation are you running?

Home Assistant OS

On what type of hardware are you running?

Windows

OzGav commented 3 weeks ago

Did you see all the warnings in the log about tagging problems? That will probably be why some are missing. You are advised to retag with Picard.

madbrain76 commented 3 weeks ago

Did you see all the warnings in the log about tagging problems? That will probably be why some are missing. You are advised to retag with Picard.

Not yet. I just discovered the existence of another log - not the one under Settings / Add-ons / MA / Logs. This is because I never clicked or hovered over what I now know is the "Core" icon. Also, I was doing this very late at night.

I looked briefly into Picard. I'm not sure if this is the solution. I have a 70k track which is already tagged, mainly through JRiver Media center, which does a good job with all the fields. I'd like to continue to use that method if possible.

Regarding this album, if the problem is that the two missing tracks haven't been tagged, shouldn't they still be visible in the "Tracks" view, just without the tags ? And if so, what type of search could I use to find them ? I would think, knowing the names of the files, that there should be a way to search by filename, but that doesn't seem to be the case.

I was hoping to isolate this specific problem to just the one album. So, I disabled all my music providers, and created one that points just to the folder containing this album (without any spaces in the network share/folder name).

I ran into another issue, though. Even with the other providers disabled, all the tracks & albums from these providers still show up in the list, just greyed out - they cannot be selected. Since I have 70k tracks (x 2, one for the remote share, one for Plex provider), the 18 tracks are buried in a sea of other files, and I cannot really tell what's going on with this one album.

I suppose one option would be to delete those other music providers altogether, rather than disable them. That would presumably remove all the greyed out track/album entries from MA. But I don't think I should have to do that.

For one thing, I don't see the purpose of listing all the greyed out albums/tracks that one cannot click on in this situation. For another, it is not consistent with the rest of the UI. When you disable a player provider, the associated players no longer show in the players selection list. It should be the same when disabling a music provider - associated albums/tracks shouldn't be listed. Anyway, that is a separate UI issue.

The only thing I can think of doing in this case is to backup MA, delete all the other music providers, and then try with just the one pointing to one folder. And restore MA afterwards. I'm going to do that now, but it's more work than it should be to trace a file system and/or tagging problem.

madbrain76 commented 3 weeks ago

Even after deleting all the music providers (except the built-in MA one) - even the one pointing to this 18-tracks album - 2051 greyed out tracks still remain, as well as 3 greyed out albums. Here is the log for that attempt.

log.txt

OzGav commented 3 weeks ago

There is an issue with a fix waiting to come out in the next beta to fix the cleanup of orphaned items https://github.com/music-assistant/hass-music-assistant/issues/2167

If you look at the first log you will see the album artist is not set and you have the setting to default to VARIOUS ARTISTS therefore the MA library isn't going to look good as all of those tracks/albums will be listed under there. (Unless this is what you want of course)

madbrain76 commented 3 weeks ago

So, I deleted MA, and reinstalled it. At that point, it correctly showed no entries under "Tracks", or "Albums". I then added the music provider to point to the single-album network share/folder. I now see 16 tracks in the "Tracks" view. See screenshot.

image

However, if I click on "Albums" and select that one album, I get something different : it shows all 18 tracks, numbered. Tracks 8 and 16 are not missing.

image

Here is the relevant log :

log.txt

madbrain76 commented 3 weeks ago

There is an issue with a fix waiting to come out in the next beta to fix the cleanup of orphaned items #2167

If you look at the first log you will see the album artist is not set and you have the setting to default to VARIOUS ARTISTS therefore the MA library isn't going to look good as all of those tracks/albums will be listed under there. (Unless this is what you want of course)

Thanks. Good to know that it's a known problem. The bug you pointed to only talks about artist remaining. I my case, I had albums, artists and tracks remaining. I will add my 2 cents in there to refer to my comment in that thread.

OzGav commented 3 weeks ago

Its possible due to the tag information that two of the tracks are getting collapsed together. From what I can see there are six identically named tracks in two cases. You need to make sure these are all tagged comprehensively so they can be differentiated.

madbrain76 commented 3 weeks ago

You are referring to BWV 992. That is a single works made of 6 movements. The file names start with the track number in the album. It is true that the "Title" tag is the same for all 6 files.

image

It still doesn't explain why "Tracks" only shows 5 of these movements, and one is missing. If the identical title field was the problem, I would expect just 1 out of 6 to show under "tracks", not 5 out of 6.

Even if MA is not able to associate all 6, shouldn't the tracks view always show all the files ? The filenames are unique by definition, at least within one music provider.

I'm wondering if it has to do with the duration. Track 4 (2nd movement) is 2:29 in length. Track 8 (6th movement) is 2:28 in length. Maybe MA is considering them to be the same because of the close length ?

The same is true for tracks 14 and 16 - same title, and length of 1:42 and 1:43 respectively.

OzGav commented 3 weeks ago

Yes that is likely the case. Again you need to tag these comprehsively. I think this is the CD from the MB database so Picard will be able to tag this properly https://musicbrainz.org/release/2a70f1f7-86b3-4d15-997f-774d3615f34b

madbrain76 commented 3 weeks ago

I deleted tracks 4 and 14, and synchronized the music provider. Suddenly, tracks 8 and 16 appeared in the tracks view ! So, it does indeed seem to have to do with incorrect duplicate algorithm detection. There should be plenty of metadata information usable to differentiate them, such as bit rate, and the size of the file, Worst case, there would need to be some audio analysis of the audio data and corresponding hash, but that would be computationally intensive and not necessarily reliable.

OzGav commented 3 weeks ago

Just tag them properly as described in the docs

madbrain76 commented 3 weeks ago

Yes that is likely the case. Again you need to tag these comprehsively

That is simply not going to scale for a very large collection. Other servers don't have this problem with the same album. This is incorrect duplicate detection and clearly a bug, IMO. Perhaps this could be something tunable at either the provider level or MA level, depending on where it's currently implement. But two files varying by one second IMO should not be assumed to be duplicates without some further confirmation.

OzGav commented 3 weeks ago

You haven't shown the rest of the tags to compare but it appears to be the same album, same track name, same artist, same album artist, same genre and within 1 second of the same track length? Yet MA is supposed to work out these are different? It isn't valid to say "other servers don't have this problem" when you aren't comparing apples and apples. Other products aren't bringing together multiple local and streaming providers and matching those together to provide a consolidated interface. You are welome to open a feature request and see how many votes it gets or you could just do what everyone else is doing and tag your files as recommended.

madbrain76 commented 3 weeks ago

Respectfully

1) The doc says this under the filesystem provider :

Local tracks and albums will be linked to the same tracks or albums on other streaming providers. Note that same is not simply same name. The tags are reviewed to ascertain whether it is indeed the exact same track. Without tag information MA will attempt to identify identical tracks based on the other information it has such as artist name, album, and track length. However, poor tag information may lead to poor matches. Track length is different in this case. The fact that tracks of mismatched lengths, even by one second, were considered duplicates, was a surprise. But I will grant you that this is a narrow case, and there could well have been 2 tracks of the same length with identical tags, and perhaps there are even some in my collection, but I'm not sure how to find out.

2) Consider the user experience.

a. I bought the CD for this album a very long time ago b. I ripped the CD, also a very long time ago c. I imported it to my software of choice at the time, JRiver media center d. I tagged it, most likely using CD-DB or some other database, or by hand e. I created a network share provider in MA to load this album f. I played the album g. after playing track 7, it skipped to track 9. This was immediately obvious to me because I know BWV 992, and I have played the missing 6th movement myself h. I was sad i. I filed an issue on github j. I spent a lot of time investigating the problem k. the developer also spent his time following up l. U was told MA is working as designed, and to fix the file tagging m. I wonder how many other albums in his 70k collection might have this problem in MA

I submit that anything after step g is not optimal, and there are several possible improvements. I will resist the temptation to suggest how, but I'm sure you can find a way if you agree that this could be better.

I will offer the following questions : a. how could I have known there was a problem with the album before playing it ? b. how could I find out if the issue affects other albums in my collection, so I don't run into this again ?

3) This false duplicate issue may not be unusual with classical music Many works are made of multiple movements. On the disc in question, there were two works affected. It is not rare for two movements to be of similar length, and some could in fact be an exact match in length. Volunteers who created the CD-DB entries sometimes do not enter all the individual movement number or names, as in this case. Sometimes, two movements even have the same name, but different numbers. The tags that are present are arguably incomplete, and not truly incorrect.

OzGav commented 3 weeks ago

In answer to your question (a) there are so many warnings in your logs that are the result of tagging problems that will point to likely problems for you all over the place; and (b) you need to tag the albums properly so the warnings go away.

I understand that re-tagging the albums takes some time, I had to do it myself. However, now I have a seamless experience across my local tracks (when I have some identical tracks on multiple albums) as well as linking across multiple streaming providers.

It is not possible to reduce the track length matching to a zero difference because that will result in not matching with streaming providers when their track lengths vary by many seconds.

Sounds like CD-DB is not the ideal database to use. The Musicbrainz database is striving hard to have unique identifiers for each item (artist, recording etc) and then to link them across so there are no duplicates and things such as artist names are standardised (e.g. No "B52s" vs "The B52s" vs "The B52's"). I showed a link to MB for your album above which is in the database so you can certainly fix that one and maybe many more. You can just point Picard to the top of your music collection directory and see how well it auto matches. Then you can just do a check and press save and you are done. Time is taken up when it doesn't auto match and you have to find the album you want or if it isn't in the database at all and then you can decide if you want to contribute it to the db for others.

If you don't want to retag your albums then maybe MA isn't for you as it needs that metadata to work properly for the reasons mentioned previously.

OzGav commented 3 weeks ago

Additionally there is something wrong with this album 2024-06-09 02:02:55.217 ERROR (MainThread) [music_assistant.providers.filesystem_smb] Error processing Beethoven/Murray Perahia, Academy of St. Martin in the Fields & Murray Pe Beethoven_ String Quartet, Op.127. Piano Sonata, Op.101/Beethoven - Beethoven Piano Sonata No.28 in a .... etwas lebhaft, und mit der innigsten emp.m4a - Unable to retrieve info for /tmp/filesystem_smb--SUuTKbob/Beethoven/Murray Perahia, Academy of St. Martin in the Fields & Murray Pe Beethoven_ String Quartet, Op.127. Piano Sonata, Op.101/Beethoven - Beethoven Piano Sonata No.28 in a .... etwas lebhaft, und mit der innigsten emp.m4a: Invalid data found when processing input

madbrain76 commented 3 weeks ago

1) I looked at this particular album with Picard. It appears the files in that album all have a unique "track number" field in the metadata.

image

Isn't this something that MA could already take into account in the duplicate detection process, in addition to the title / artist / album fields ? Even Windows shows this track number field. Is it not part of the ID3 spec ?

2) It may be that no duplicate detection algorithm is ever going to make everybody happy. Perhaps it should be possible to simply turn off the dupe detection, especially if one is only using a local library music provider, and not cloud.

3) I can certainly appreciate that there are tagging problems in my library. And to be clear, I'm not saying I don't want to do any work to fix them. I have not reviewed the logs yet. But I can safely say that process is not one that's going to be easy, especially if I'm trying to play an album from my smartphone. IMO, something needs to happen before it gets to that, if only to become aware of the issue, and then be given the opportunity to fix it.

My preferred 'source of truth" is the information I have recorded over the years using JRiver media center - mostly automated fetches from CD-DB, but some albums have been manually entered. I find that software very flexible for my needs, and would like to continue using it to edit my library. It has extensive support for tagging. But it's not its only or even main purpose, unlike Picard.

I'll just throw some random ideas - some may be simple, some may be a lot of work, some may conflict with each other, but perhaps some will be of interest, and it will not have been a waste of time typing them.

For detection :

a. Something could be done at the time the music provider is added, or each time it is synchronized. If there are errors (dupes or others), then present the relevant portion of the log in a window/tab

b. or show it in a global log window, if the conflicts are between tracks across providers, rather than within one provider, as in this case

c. maybe use some sort of tree view do display with clickable provider / folder / album / track

d. possibly, make use of HA notifications to draw attention to the problem when the log contains errors such as dupes

e. or maybe not. The notifications may be annoying if there is no easy way to resolve them. But they could be dismissed as long as they don't reccur

Resolution :

f. for each separate issue, tell the user to fix the specific library/tagging issue externally, and provide a button to resynchronize relevant files afterwards, and then confirm that the issue has been resolved

g. there could be some wizard-like interactive process to go through the log entries. For dupes, it could be a question like "are these 2 tracks the same" ? Perhaps with the opportunity to play each one and check it out yourself by ear.

h. when running through the wizard, for local file systems, some partial analysis of the suspected dupes could be run, maybe a portion at the start or end in particular in case there is just a difference in pauses/silences

i. or, provide a button in the settings to do a "deep-dive" fully-automated analysis and do a non-interactive dupe detection based on audio content, which you obviously wouldn't want to run by default due to computational cost

j. if Musicbrainz ever implements a web version of the Picard tool, integrate it so the process of fixing the file system library can be more seamless, by allowing the metadata to be edited for those problematic files. And yes, even another library manager/tag editor like my beloved JRiver will pick up external changes made by Picard or other programs. Actually, I found that there is a docker-picard project that makes Picard accessible via web browser. Maybe that is something to consider in the future..

madbrain76 commented 3 weeks ago

Unfortunately, even Picard could not fully resolve the issue with my album, at least not at first.

I then reopened Picard and selected those 5 tracks, and did another lookup. This time, it managed to fix tracks 3 and 8. Very strange why a second pass would yield different results than the first.

For the remaining tracks 4, 5 and 7, I used the "Generate Acoustic IDs fingerprints" and then "Scan", and that took care of them at last.

My Amazon records show that I bought this CD on September 17, 2002. There was a typo in the word "harpsichord" ("harsichord") in the listing, so I didn't originally find it at first. I didn't start using Windows at home until 2007, and JRiver Media Center in 2013. It's hard to tell with which OS / software I ripped the disc with and when, but the files were all converted a few times over the years between WAV, M4A lossless, WMA lossless, and now FLAC .

madbrain76 commented 3 weeks ago

More happened after I fixed the tagging. log.txt

1) After I fixed all the files, I tried to synchronize / disable / reload / synchronize the file system (remote share) provider. During one of those, I believe the reload, I got an SMB error telling me the share was not available. But I checked from multiple Windows hosts, and it was definitely still being shared. I then went to the "configure" menu option for the music provider, and simply clicking "save". That solved it. That probably deserves its own bug. I will attach the log still before I forget about it.

2) after the sync/reload went through, the old album name remained, but showing empty. That could be related to issue #2167 . 3) under the new album name, Track 8 showed twice, one of them greyed out

4) I then deleted the qobuz provider, which was already disabled. The greyed out track disappeared. I did have this particular track favorited in Qobuz .The metadata information now looks identical between Qobuz and my network share . I am surprised that this track, present in 2 providers, didn't get matched/merged as duplicates in this case. Perhaps it is because one provider was disabled.

5) I re-enabled Qobuz. The 2 versions of track 8 appeared - the local one, and that from Qobuz

image

6) a moment later, those two tracks got merged when I refreshed the UI.

image

It no longer shows it being favorited. Clicking "Show info" on that track shows that it belongs to the filesystem (remote share) provider.

7) I then refreshed the UI one more time - and now the 2 versions of track 8 are showing up again, just like in step 5 ! This is pretty weird.

image

The UI then showed both tracks for the next few minutes even as I refreshed the page.

My take on this :

a) de-duplication is not getting run against disabled music providers. That explains problem 3) above

b) problem 3) would not exist if the albums/tracks from disabled providers were hidden rather than greyed out

c) when both providers are enabled, the de-duplication seems to get confused - alternately showing 2 tracks or 1 track. This could be caused by the track being a favorite in one music provider (Qobuz), but not in the other (remote share). I'm unsure what the expected behavior should be in this case, but alternating between showing 1 and 2 tracks is probably not what's intended. This case may not be simple to re-create, though.

madbrain76 commented 3 weeks ago

Found more issues related to duplicates - can't help but report them.

  1. I opened the album view
  2. selected the Qobuz (favorited) versios of track 8, and played it
  3. I then went to the player view
  4. it showed the following screen :

image

Oddities : a. it shows queue (10) and played (11), when the album only has 18 tracks (19 with the dupe #8 track) b. some of the tracks show the album artwork, while others do not

  1. I selected one track with artwork and clicked the "menu / show info" option in order to determine which music provider the track belong to.

I got this error :

image

  1. I re-opened the player view, and selected one track without artwork, and clicked the menu / show info option. Again, I got an error 54.

  2. I repeated steps 1-4 . I got a different number for "queue" and "played" than the previous time . And their sum is also different. I'm attaching one more log.

log.txt

I know this issue will need to get split into several and I apologize for it all being into this one, but I'm recording the problems as I'm finding them, and opening separate defects and recreating the whole sequence of events from scratch would significantly slow me down, and be very difficult. At some point I need to sleep, and the problems won't be fresh in my memory the next day (not to mention screenshots and logs).

OzGav commented 3 weeks ago

Yes you are correct this is too much and is not starting from a clean base so will likely be impossible to recreate. There are open issues that need to be resolved that would be impacting here as well. My suggestion is you tag your files as far as you are prepared to do so and wait until any related issues are closed then start again from an empty database. Bear in mind that problems you encounter from poor tagging will likely result in the same answers I gave above. Having said that actual bugs are welcome to be found and reported as with the complexity of MA there will still be some no doubt.

madbrain76 commented 3 weeks ago

I believe many of the problems I ran into are not intended behavior, ie., bugs, and will be reproducible, even with a simple, properly tagged, single-album library. Some of the problems mentioned in the later part of this conversation occur with an additional cloud provider containing the same album. There are a few intermittent problems, one Picard issue, and a number of RFEs.

I have put a ton of time into this, in case it wasn't obvious, so I hope the problems are given serious consideration. I am only reporting them because I want Music Assistant to improve. I have worked in software development for about 3 decades, and I know that all software has bugs, and in larger projects with many components, the problems tend to be more complex.

OzGav commented 3 weeks ago

Issues that can be replicated (e.g. https://github.com/music-assistant/hass-music-assistant/issues/2435) will be fixed. Feature requests (i.e. changes in current functionality which includes the track matching algorithm) are prioritised according to popularity.

sfnis commented 2 weeks ago

@OzGav would you consider working to change the default behavior from “missing album artist tag” and defaulting to Various Artist, to instead trying the Artist Tag, or even doing an artist look up based on the Album name?

OzGav commented 2 weeks ago

@sfnis that is already available in the settings.

sfnis commented 2 weeks ago

Nice. I dont see it in the settings. Can you tell me where, or link me to the documentation? Thanks.

OzGav commented 2 weeks ago

MA settings for the file system provider you will find this

image
madbrain76 commented 2 weeks ago

Issues that can be replicated (e.g. #2435) will be fixed. Feature requests (i.e. changes in current functionality which includes the track matching algorithm) are prioritised according to popularity.

I think this is somewhat problematic. In the case of this issue for example, it wasn't immediately clear that the behavior was as designed, because this particular aspect of the track matching algorithm isn't documented. And I'm not expecting that it should have been - it's impossible to describe everything in such details. I'm however disappointed with the current process - close the issue, and ask the reporter to file it again in a different forum. That is a very sub-optimal process. Over my long career in software development, as both reporter and developer, I can say that it has not been rare for issues to be recategorized from bugs to feature requests. And sometimes after a deeper dive, the feature request may revert to a bug, also. The way github is being used by MA, HA and other products, really leaves much to be desired in this area. It should be as simple and painless as changing an issue property from bug to feature request, or back. And voting can still apply to those feature requests. For example of a better tracking system/method, check out bugzilla. It is much more comprehensive than anything github currently offers for issue tracking. Even if the project sticks with the current issue system, there could still be a "triage" process that would cause the issue to be migrated to the feature request pile, after evaluation by one of the devs.

If this only had to be done once or twice, it wouldn't be so bad. But I'm afraid it may not be. I'm left wondering where some of the issues should be filed.

1) while I can try to file "obvious" bugs such as #2435, what's an obvious bug to me still might not be to you.

2) can the track matching algorithm only have feature requests, and never bugs, because it isn't described in enough detail ? Where is the line drawn ?

3) for instance, it is intentional that the duplicate matching algorithm doesn't use the track number metadata ? Or is it an oversight, or a bug ? I truly don't know the answer to that question. But the answer affects what forum it belongs to. There is 50/50 change that I will file it in the "wrong" forum.

4) another example - how should the front-end display tracks that are duplicates across providers, but differ by favorite flag ? Should they be coalesced or not ? And if yes, how does the favorite status get managed ? Again, I don't know what the expected behavior is. I just know the current one looks odd.

Some guidance on these would be welcome in order to avoid filing things in the wrong place.

OzGav commented 2 weeks ago

I see in other discussions you have read the file system provider docs. For this reply I will highlight a couple of key points.

It is very important that all of your audio files contain proper ID3 tag information. The more comprehensive the tagging the better the results will be when using MA.

Without tag information MA will attempt to identify identical tracks based on the other information it has such as artist name, album, and track length. However, poor tag information may lead to poor matches.

To minimise the chance of problems with MA you should follow the Kodi guidelines here https://kodi.wiki/view/Music_tagging Just about all the tips, tricks and suggestions on that page are applicable to MA and if you follow it all to the letter you will have a much better experience.

Point 1. You can always ask first in Discord if something is a bug or not if you aren’t sure.

Point 2. The line is drawn when a user does all the docs recommend to avoid problems, that is, comprehensively tagging all local files but then a problem occurs.

Point 3. Yes I would say it is deliberate. If the tracks are properly tagged the track number does not need to be checked. Furthermore, introducing this may have other undesirable effects when the same track appears on different albums with different track numbers or even worse on a different release of the same album where the tracks are in a different order. Then there is the issue that some streaming providers don’t supply track numbers.

point 4 has been answered in your other report.

madbrain76 commented 2 weeks ago

@OzGav , thanks for responding.

  1. While I have tried it, I'm afraid the Discord UI is very confusing to me - I vastly prefer something less flashy like IRC for live discussion. My waking hours are also very odd and variable, and might not overlap with others very well.
  2. OK, fair enough. I read the MA FS doc. I will admit I did not read the Kodi tagging guidelines. Then I would say I'm asking for feature requests, and I filed several. That said, for this particular issue, a strict reading of the doc says "Local tracks and albums will be linked to the same tracks or albums on other streaming providers.". This was a case of 2 tracks within the same file system provider, which is technically not covered by that statement. Since you have indicated the behavior is intentional, the doc should be clarified. Same issue about this statement : "On playback, when tracks are linked across providers the highest quality version is used automatically." It does not cover what happens if one is using only the FS provider, but has multiple versions of the same work in different quality. I have many such cases, for example, many versions of the 1981 Goldberg variations by Glenn Gould. But the album name is different in my collection, and some other metadata probably is, too. Not to mention track length is not an exact match, either. But still the same recording. The same is true for Qobuz - different album names for the one in 24 bits vs 16 bits. I have a DSD version on top of those that Qobuz doesn't. And PCM conversion to 192k/24 bits also.
  3. I see your point. I actually do have several albums that have duplicate works on them, on different tracks, within my file system collection. Sometimes, a compilation where just one track from another album appears. And I want to be able to play each album separately with all the tracks. Will that happen if there is a duplicate track (with a different track number) between the two albums ? I have not tried it yet, as it will take me a little bit of time to find such an example.
madbrain76 commented 2 weeks ago

I read (glanced) at the Kodi tagging guidelines. I did not see a clear statement that the track names should be unique within an album, or that two tracks of similar length would be matched. After I read that, I installed Kodi 21.0 for Windows, and I added a folder containing the album mentioned, with the original metadata, before disambiguation by Picard. Kodi did not treat any of the tracks as duplicates. So, there are differences in the duplicate matching algorithm between Kodi and MA. I'm going to experiment with editing tags and see at which point Kodi will match things. I'll probably use another album, one with two tracks of the same exact length, rather than off by 1 second.

It's true that Kodi doesn't have to contend with cloud music providers, but this issue was reported within just one provider.

madbrain76 commented 2 weeks ago

Results of my testing with Kodi, with this album, so far :

1) making an exact copy of the album in 2 folders causes a full match of the album & all tracks.

2) for track name / title Setting all tracks to the same value causes no duplicates.

3) for track number Deleting track numbers causes no duplicates. Files appear to be sorted by filename, which start with 2-digit track numbers, thus the order remains the same.

4) no metadata I used metaflac --remove-all to delete all metadata information from all the files. There was no match in Kodi.

5) I delete the contents of the album folder, copied the original track 1 with metadata back into it, and made a copy of the file under a different name, in the same folder. Kodi did not show them as duplicates. It showed them both as track 1.

6) I moved the copied file to a separate folder. Kodi still didn't match the identical files from different folders, probably because they had different filenames.

So, it looks like Kodi is quite strict about duplicate matching. The only way I could get it to match dupes was case 1 - completely identical albums, in terms of number of files, filenames, and all metadata, in different folders. Seems like a very conservative dupe detection algorithm, with no possibly of incorrect matches, but missing several cases of dupes.

OzGav commented 2 weeks ago

I have made some clarifying changes to the docs. Thanks.

I think this will also be useful information for you

Highest quality is based on sample rate, bit depth and codec and local is preferred over cloud.

You ask:

I want to be able to play each album separately with all the tracks

The whole album will play but if you happen to have a higher quality version of a track in another folder then that one should be selected for playback but this will be transparent to you. (unless you note a difference in the codec label in the UI)

In relation to your comments on the Kodi tagging guidelines the intention by directing people to that is so that they tag their files comprehensively and put them in a sensible folder structure. It is not meant to imply that what Kodi subsequently does with that information will be the same as what MA does. MA is unique in its approach, as far as I know, in its attempt to identify identical tracks and collapse those together . Using the MBIDs is the best way to do the matching as they are unique at every level (track, album, artist) and thus it avoids the situation you found where there were identically named tracks on the album. Sometimes I think I have found an error in the MB database but on close inspection the tracks will be different for some reason (shorter radio edit for example). However, I have found one or two things in my time whereby I submitted changes to the MB database to fix something.

I did not see a clear statement that the track names should be unique within an album, or that two tracks of similar length would be matched.

And you won't as that is MA doing that as it tries to identify same tracks with no other tag information (and the docs are clear that poor matches may occur in this case). Once you add those MB tags to your tracks all of this will be fixed.

madbrain76 commented 2 weeks ago

@OzGav Thanks for the explanations.

I totally get that MA is trying to do something unique. Sometimes there are some pitfalls with new features, edge cases that need to be considered more carefully, etc. Something ground-breaking is not necessarily for everyone all the time :) But I already filed my RFE to discuss the possible enhancements/solutions/workarounds.

When it comes to using MBIDs, it might be the best way for you, but it has some disadvantages too :

  1. those MBIDs are not standardized metadata elements,. They are specific to the MB database. They don't apply to cloud music providers, also, as far as I know. And a lot of software doesn't support them. That means the tagging will show up only in MA, but not in other apps.
  2. since MA doesn't provide its own tagging facility, the file system providers, both local and network, are expected to have their content tagging performed externally. One should be able to continue using the program one's choice, even if it is not tied to MB and doesn't write MBIDs. The software I chose to manage my library was based on numerous factors, and support for MB wasn't one of them. Do I wish it was supported ? Yes, it would be helpful. But its absence isn't going to sway me to another program, unless it is clearly superior overall.
  3. I have only tried running Picard on the one album mentioned in this thread, and had rather odd results, that required 3 manually triggered passes to get all the track names correct. That is probably not a process I would want to repeat for my entire collection. I would consider using Picard for the false duplicate cases such as this one. Unfortunately, as it stands, I don't have any visibility on where they are, and therefore, I cannot even try to fix them. This is part of my RFE as well - to find these issues upfront.
  4. Picard may be very good at its specific purpose, but it's not a replacement for a more comprehensive program like JRiver Media Center, which is my preference. Even for tagging changes, its UI really is quite a bit more advanced than Picard, despite not supporting MB.
OzGav commented 2 weeks ago

On point 1 that is true but Picard also adds ISRC and BARCODE fields and these are often used by streaming providers and are used to match tracks. At the end of the day MA can't worry about what other apps require as that is what it requires. If people want to use MA to its potential (and avoid unusual outcomes) then tagging with this information is required.

  1. You can use any tagging program you like. Indeed we also mention MP3Tag in the docs.
  2. I think part of your issue is familiarity with Picard. I am not going to say it is simple to get all the music tagged but it is possible. When I had completely untagged music I start with MP3Tag to get the basics in of album, artist and track. Then I pull it into Picard and right click and manually select the version I want by selecting OTHER VERSIONS. I refer to the MB database to identify the correct selection if the list is large.
  3. No argument there. As above it doesn't matter program you use as long as the tags MA requires to function best end up in the files