ignore duplicate titles/files

owntone / owntone-server

Linux/FreeBSD DAAP (iTunes) and MPD audio server with support for AirPlay 1 and 2 speakers (multiroom), Apple Remote (and compatibles), Chromecast, Spotify and internet radio.

https://owntone.github.io/owntone-server

GNU General Public License v2.0

2.04k stars 234 forks source link

ignore duplicate titles/files #85

Open snizzleorg opened 9 years ago

snizzleorg commented 9 years ago

I have the music directories of several people indexed by forked-daapd and of course there is duplicates between those libraries.

Would be nice if forked-daad could suppress those duplicates

ejurgensen commented 9 years ago

Yes, I agree. I'll have a look at this.

ejurgensen commented 9 years ago

Closing this, it should now not add duplicates

snizzleorg commented 9 years ago

does the database need to be rebuilt?

ejurgensen commented 9 years ago

Yes. I didn't want to check for duplicates during the start-up, because it would make the start-up longer. So it only checks when a file is added or modified. If it is only a few files you want to remove, it might be easier to trigger a modification with "touch".

EDIT: Use .full-rescan to rebuild, so you don't lose pairing info.

chme commented 9 years ago

I have not tested this change, but i think this can lead to missing songs in playlists and if a library path is currently unavailable (files are disabled in the db).

A playlist could contain a path to a song that was not added to the files db table due to existing duplicates. And if a song that is on a drive currently not mounted it gets disabled, the same file on a local drive would not be added to the files table.

Wouldn't it be better to filter the duplicates when selecting the songs for a list?

(Additionally the proposed mpd protocol feature allows browsing the file system, it would be odd if there were missing files)

ejurgensen commented 9 years ago

Yes, you're quite right about these problems. Filtering would probably work better, but it seemed like that would require extensive modifications of the queries, which was a bit daunting. Can you think of a good way?

snizzleorg commented 9 years ago

I also second the filtering approach.

chme commented 9 years ago

The more i think about this, the harder i find it to define a duplicate. Title + album + albumartist might not be enough. In the library they can still be distinguished by song number, e. g. if a song with the same title appears multiple times on an album in different versions. I have no experience with the shared library in iTunes, but i think there are other fields that may differ: song length, file format, ...

If you want to filter, i can think of two possible ways: 1) add or extend a group by clause with title, album, albumartist. According to sqlite documentation, the columns not in the group by clause will be from a random row of the aggregated rows. 2) add the filter logic in the loop over the query result. Only add a row to the response if it is not equal to the previous row.

Both approaches have a random behavior which one of the duplicates will appear in the client.

I would prefer to keep the duplicates and to leave it to the user to expilcitly specify which files/folders should not be added to the library. The new config option is a good way to do this. A more flexible approach would be something like supporting ignore-files similar to the .gitignore file or the .mpdignore files.

ejurgensen commented 9 years ago

Yes, title/album/artist is not enough, because some albums have multiple tracks with the same title. That's why I did title/album/artist/fname, which I think should be ok.

I've reverted the change, so it can instead be done in a better way. Maybe having the filter in the loop might actually be ok, in the sense that it wouldn't require that much change. One thing to remember, however, is that some queries include a count, so that would also need adjustment.

whatdoineed2do commented 1 year ago

ffmpeg has a feature to generate an audio hash that i've used in different project - this hash represents the audio data/frames only so is unaffected by metadata.

ffmpeg -i foo.mp3 -c:a copy -bsf:a null -f hash -

This audio hash shoudl be a good way to solve identification - read the audio frames the file as being scanned; it will read ALL the frames though so ignore for remote or pipe audio. To support this, extra hash (stirng or hashed value of ffmpeg audio hash) column in DB/ yacc/lex to expose col for search grammer / and extending library/filescanner_ffmpeg.c:scan_metadata_ffmpeg() to populate.

For UI, it could be as simple as having the track dialog indicate there are N other instances of the same track, clicking on this would take them to a track listing page of said tracks based on the audio hash value in the db. to suppress, if in the same album/playlist view, the UI could just unique filter on the audio hash and display.