Open GoogleCodeExporter opened 9 years ago
Original comment by bjorn.ol...@gmail.com
on 20 Mar 2008 at 5:21
The safest way should be to checksum the audio part of the audiofile and
compare that
to the removed files checksum. This way the tags does not mater.
Although, checksumming the audiofile requires the tagger to read the whole
file, and
that may be very slow.
Original comment by onne...@gmail.com
on 27 Mar 2008 at 12:21
Calculating a checksum is slow but would probably be the best way to find
duplicates/moved files. Perhaps the checksum should be stored as a tag in the
file
as well as in the DB.
If no checksum is found, using Musicbrainz IDs or something similar could be
useful.
Original comment by bjorn.ol...@gmail.com
on 28 Mar 2008 at 6:59
I guess I could start by comparing the safe stuff:
* duration
* bitrate
* sampleRate
* channels
Secondly the little less safe stuff:
* title
* artist
* album
The first part must have an exact match, and the second part should have the
title
and either artist or album right. If the above is true (and assuming the found
file
is removed) the file should be considered moved.
If there are multiple matches, I need to start comparing more tags in a
prioritized
list until there is only one left.
Original comment by onne...@gmail.com
on 28 Mar 2008 at 11:26
Original comment by onne...@gmail.com
on 21 Apr 2008 at 6:51
I am no coder (as you know), but recognizing "old" files would be nice for not
loosing your song statistics. Why not make checksum when adding a file to DB
(or
changing tags) and store it there. If that file is moved, mC2 would add new
found
files, make their checksum, compare with old ones and then delete or adjust
missing/
moved files.
Original comment by HomiSite
on 22 Apr 2008 at 4:14
Hey All,
DocTriv pointed me at this issue. I recently started a thread about this on
the mC
forums requesting this feature.
I'm sure you guys realize this, but adding a checksum to the db (and possibly a
tag)
would be a much larger benefit than just allowing the indexer to recognize moved
files. It would also allow for the merging of databases which would allow all
sorts
of flexibility, namely the collecting into a single repository metadata from
several
separate instances of mC. There would be a slight performance hit, but my very
unscientific testing shows with CRC32 checksums about 67 mp3s/sec can be
processed on
a relatively fast box.
Keep up the good work guys.
Thx,
-Mid
Original comment by midnigh...@gmail.com
on 22 May 2008 at 6:42
67 mp3s/sec checksum sound quick. My avarage mp3 filesize is 5Mb whitch would
mean an
IO speed of 335Mb/sec. This leads me to beleave that this must be a partial
checksum
of some sort.
Using a partial checksum is not a bad idea though. Maybe do a checksum on the
first
1KB of the audiopart of the file.
I think many people put their files on a network share of some sort. So I/O is a
critical issue.
Original comment by onne...@gmail.com
on 22 May 2008 at 7:51
Good catch. I was using a duplicate file checking program (doublekiller) to
generate
all the crc32 checksums for comparison. I ran it again and watched IO
performance
and paid closer attention to what it was doing. It must do exactly as you
mentioned
and just do a checksum on a very small portion of the song as there was almost
no IO
for the first pass, then it appears to go back and do full checksums on the
collisions which pegged disk IO. The full checksums still appeared to zip by
quickly
but I've no way to judge peroformance with such a crude test.
Sorry for misleading everyone. Doing the partial checksums of a specific
segment or
two of each song would increase performance dramatically, not sure how big the
segment would need to be to reduce collisions down to an acceptable rate
(hopefully
near 0).
-Mid
Original comment by midnigh...@gmail.com
on 22 May 2008 at 8:26
I think the partial checksum is a good idea. Although, thinking about it, maybe
take
the checksum not at the very beginning. Many songs tend to be quiet in the
beginning,
so checksum could (but not likely) be the same.
To summarize:
Comparing a file for being move (or anything else) should consider the
following:
* duration
* bitrate
* sampleRate
* channels
* partial checksum
Original comment by onne...@gmail.com
on 22 May 2008 at 8:41
This is probably obvious but I just thought of it so I thought I'd put it out
there.
To cut down on collisions caused by using a small segment checksum (if there
are any)
you could also use other static information about the mp3. IE, if there is a
checksum collision but the song length is different it must be a different
song.
Wouldn't be perfect but it'd be a quick painless way to double check.
-Mid
Original comment by midnigh...@gmail.com
on 22 May 2008 at 8:45
I think CRC32 does do a full checksum and the reason it is quick is because it
doesn't analyse the who file. It pick blocks out that it needs.
Checking a file against it's CRC Hash is the only reliable way to check to see
if a
file have moved. MP3 tags can be changed outside of MusikCube and this would
result
in a different hash.
This would be good for merging databases :)
Original comment by gatekil...@gmail.com
on 23 May 2008 at 10:36
Original issue reported on code.google.com by
onne...@gmail.com
on 18 Mar 2008 at 8:51