Rubyripper should use absolute offsets for the end of a rip

GoogleCodeExporter commented 8 years ago

CDParanoia does a bad job identifying the length of final tracks on discs which 
have a data track at the very end of the disc (Enhanced CDs).  In particular, 
it will try to read well into the lead-out and lead-in of the next session, as 
it only relies on TOC data, and it does not attempt to calculate the lead out 
of individual sessions.  On certain devices/OSs, this may cause CDParanoia to 
throw errors while reading and thus cause rubyripper to crash.  It is also 
possible (but I am not sure) that too much audio data will be read at the end 
of the disc in such cases.

As far as I can tell, this issue exists in the latest version of CDParanoia 
(10.2), and may apply to earlier versions as well.

Fixing this will require two key changes:

1. cdparanoia will need to be given the END of a range to rip (particularly for 
final tracks and disc images)  Remember that cdparanoia durations are 
INCLUSIVE, so you must specify the lead-out time MINUS one frame to correctly 
rip.  I find that specifying the exact lead-out time for a final track on an 
Enhanced CD will give bit-wise identical results from cdparanoia and EAC.

2. The correct start of the lead-out must be calculated (in the same way it is 
calculated for MusicBrainz DiscIDs).  This can be done for almost all current 
discs using current scanner data, but will require reading full TOCs (not even 
available as Linux ioctls) to do correctly 100% of the time.  Multi-session 
TOCs, which are available as Linux ioctls can get us most of the way there, but 
may still fail in very unusual circumstances.  In the long run, external 
libraries (libdiscid or my in-development libcueify) may be able to provide 
this information without too much work on our part.

CAVEAT: Fixes to Issue 454 and other potential issues with reading into the 
leadout with cdparanoia (https://bugzilla.redhat.com/show_bug.cgi?id=635925) 
will impact how this particular bugfix works.  Thus, any fix which explicitly 
specifies the lead-out explicitly must be careful to avoid triggering 
cdparanoia bugs.

Original issue reported on code.google.com by comradec...@gmail.com on 1 Nov 2011 at 5:40

GoogleCodeExporter commented 8 years ago

Ah, after playing around with a track-wise rip (rather than a single-image rip) 
I see that ending offsets are already specified.  Please consider this a 
request for explicit ends for single-image rips as well as correcting the 
explicit end of the last track on Enhanced CDs, which are really two different 
issues.

Original comment by comradec...@gmail.com on 1 Nov 2011 at 6:17

GoogleCodeExporter commented 8 years ago

Okay, after tracing back, it appears that this originates from Issue 310, which 
I suspect was causing issues due to playing poorly with the problem raised in 
Issue 454 with negative offsets (Paranoia would subtract one from the TOC 
offset and then change the sample into a positive offset, which I suspect would 
cause reading from the lead-out and which might not be supported on the drive 
in Issue 310)

Thus, I believe that setting the correct offset on the last track would work 
correctly IF care was taken to ensure that the sample-offset would not cause 
cdparanoia to read into the lead-out on drives which do not support it (see 
Issue 491).  Negative offsets would be fixed by simply applying a fix like that 
in Issue 454, as cdparanoia would convert the sample offset into a positive 
offset to make up for the negative block offset.  (It might also explain why 
the wrong filesize was reported with a negative offset drive in that Issue)

Original comment by comradec...@gmail.com on 2 Nov 2011 at 4:07

GoogleCodeExporter commented 8 years ago

Here are my notes after playing around with offsets and absolute starts/lengths 
[It should be noted that these are based on testing in FreeBSD, and behavior 
may be different on Linux]:

1. As per Issue 454, all sample offsets are normalized to a value between 0 and 
+587.  That is to say, a sample offset of -587 will behave the same as a sample 
offset of +1 (with the same absolute start/end values), and so on.
2. cdparanoia cannot be forced to read into the (sector-based) lead-in of a 
drive ("[.-1]-[.1]" is an invalid timespan, according to cdparanoia), and will 
not read into what it BELIEVES to be the lead-out of a drive (cdparanoia keeps 
track of the MSF time of its rip, and with a negative offset, will believe that 
reading into the last (absolute) sector is the same as reading into the 
(MSF-based) lead-out, since it always rounds the TOC offset down.)
3. cdparanoia MAY read into the lead-out of the drive given a positive offset, 
but since my drive neither supports reading into the offset (according to EAC), 
nor do I have any discs which (ostensibly) have data out to the very last 
sample, I can't test this.  In any case, because of point 1, this could only 
read (at most) 587 samples into the lead-out (If it even does.  See 
https://bugzilla.redhat.com/show_bug.cgi?id=635925).
4. That said, in FreeBSD at least, specifying an offset of 588 samples or more 
WILL cause errors on my drive while reading the last sector of the disc (and, 
in fact, causes read errors when reading anywhere CLOSE to the last sector of 
the disc).  This might be due to how cdparanoia defeats buffers.  If this is 
indeed the reason why, I can't say how cdparanoia 9.8 behaves (as the algorithm 
for defeating buffers changed between 9.8 and 10.2).  This also may be a 
FreeBSD-specific issue, and may not arise in Linux anyway.

Thus, assuming that the above Bugzilla bug is correct, this means that 
cdparanoia can, at best, only access data within what the drive believes to be 
absolute sector 0 and the absolute sector of the lead-out (although the latter 
may be accessible with positive offsets on drives which support it.  This is 
unclear since I can't test it.)

As a result, I offer the following algorithms for reading tracks and images in 
rubyripper:

RIPPING IMAGES:

Unless we can conclusively prove that reading with a positive offset will 
actually read correctly into the lead-out, we may as well assume that we cannot 
read into the lead-out and rip the entire image as cdparanoia sees it 
("cdparanoia [.0]- output.wav"), and clip the appropriate number of samples at 
the beginning (if offset is positive) or at the end (if offset is negative) and 
(if specified via a preference), pad the opposite end with the same number of 
zeroed samples.

RIPPING TRACKS:

If it's the FIRST track, and we have a NEGATIVE offset, simply rip the 
appropriate frames assuming NO offset, and trim the appropriate number of 
samples from the END of that track (and if specified, pad the beginning with 
the same number of zeroed samples).

If it's the LAST track, and we have a POSITIVE offset, then, (again assuming 
that we actually can't read into the lead-out), simply rip the appropriate 
frames assuming NO offset, and trim the appropriate number of samples from the 
START of that track (and if specified, pad the ending with the same number of 
zeroed samples).

For any other track (including first tracks with positive offset and last 
tracks with negative offset), behave as we currently do, and offset the start 
by (offset / 588) (rounded down) frames without touching the length.

Original comment by comradec...@gmail.com on 7 Nov 2011 at 3:11

GoogleCodeExporter commented 8 years ago

I've attached three patches which tackle this issue:

0001-Add-WaveFile-class-for-Wave-file-manipulations.patch introduces a new 
class, WaveFile and an accompanying rspec test, which encapsulates wave files 
for access and modification, as well as reading according to a sample offset 
like a CD drive.  It will automatically pad the wave file internally to an 
integral number of sectors using zeroes if the number of samples is not evenly 
divisible by 588 (the number of samples in a CD-sector).  If given an offset, 
the save! method may be called to save the wave file according to the adjusted 
offset (which, if padMissingSamples is true, will also write out the padding of 
0 bytes to round to an even number of samples).  Thus, WaveFile can be used to 
correct the offset of a file with an otherwise incorrect offset simply by 
reading the file, setting the offset to the CD-drive's offset, and then calling 
save! (which will reset the offset to 0).  WaveFile also provides two 
additional methods, read(sector) and splice(sector, data) which may be used as 
part of solutions to Issue 240 and Issue 220 respectively.

0002-Add-improved-offset-handling-using-WaveFile.patch integrates the new 
WaveFile class into SecureRip according to the rough algorithm above.  For the 
first track (or a full-disc image) with a negative offset, it will read the 
first track as if it had no offset, and then use WaveFile to manually correct 
the offset without cdparanoia.  Similarly, it will use WaveFile to 
automatically correct the offset with the last track with a positive offset (or 
a full-disc image).  All other tracks behave as rubyripper currently does, 
adjusting the absolute addresses by (offset / 588) and then using the "-O" flag 
to set the offset.  It also adds a preference (defaulting to enabled) to enable 
or disable padding of the missing samples which could not be read due to the 
offset causing them to fall in the lead-in or lead-out.

0003-Prefer-cdcontrol-cdinfo-for-Enhanced-CD-track-length.patch adds two 
methods, getLengthSector(track) and getFileSize(track) to Disc (instead of 
proxying to ScanDiscCdparanoia).  They correct for the fact that cdparanoia 
performs badly with Enhanced CD tracks as discussed in the initial report.  
Currently, they use the same tocScannerForFreedbString method in preference to 
cdparanoia so as to determine the data tracks on the CD.

If the rip is an image rip and the track following the last audio track is a 
data track, then it will correct the length of the image by taking the offset 
of the start of the data track, and subtracting 11400 samples from it.  It then 
subtracts the start sector of the first track (in case it starts at a sample 
which isn't at address 0.  It should probably also do something similar for 
CD-XA discs with a first-track which is a data track, but currently does not)

If the rip is a track-rip, it only adjusts the length down by 11400 samples if 
the subsequent track is a data track.

Some testing of these fixes on my drive with a positive offset (+667 samples) 
allowed me to produce results identical to an EAC rip of a disc ripped on 
another drive (with a +102 sample offset), both on a track-wise and an 
image-wise basis.  This leads me to believe that these changes will allow 
rubyripper to produce identical results to EAC with good CDs (assuming that EAC 
has reading into the lead-in and lead-out disabled, and the new padding 
preference set identically to "Fill missing offset samples with silence" 
setting in EAC), and thus avoids any issues with missing samples like in Issue 
310.

NOTE: These patches also address Issue 491.

Original comment by comradec...@gmail.com on 8 Nov 2011 at 4:19

Attachments:

GoogleCodeExporter commented 8 years ago

I should add that, like in my patch for the MusicBrainz support, the new 
preference has not had its translatable strings collected for use in the .po 
files, and is missing GTK2 bindings, which will be needed to be added/collected 
separately.

Original comment by comradec...@gmail.com on 8 Nov 2011 at 4:24

GoogleCodeExporter commented 8 years ago

Ian, is it an idea to give you access rights to the central repository? It 
would allow faster integration with each others code I think.

Original comment by boukewou...@gmail.com on 9 Nov 2011 at 9:41

GoogleCodeExporter commented 8 years ago

For your info, I've added you as a committer to this project. So you should be 
able to push into the repository. You might have to configure it correctly 
first though. Let me know if you experience any trouble.

Original comment by boukewou...@gmail.com on 9 Nov 2011 at 9:54

GoogleCodeExporter commented 8 years ago

Will do.  I will give a shot at integrating these changes in the next few days.

Original comment by comradec...@gmail.com on 10 Nov 2011 at 6:21

GoogleCodeExporter commented 8 years ago

Okay, I landed it (with Cucumber tests), but it seems that the two git repos 
seem to disagree on the git commit IDs for practically every commit, which has 
ended up making a bit of a hash of the visual history 
(http://code.google.com/p/rubyripper/source/list), but the actual history seems 
to be fine (The fork is at rev c60563d7e4ec, which, due to being only three 
commits back for me (but about a dozen back for you) did some weird alignment 
things).

I'm going to go ahead and close this, as my tests still show this as working 
correctly.

Original comment by comradec...@gmail.com on 14 Nov 2011 at 10:52

Changed state: Fixed

GoogleCodeExporter commented 8 years ago

I love it: zero failures on the cucumber + spec tests :D I think the graphs 
will be cleaner if I don't apply patches with git am anymore.

I noticed there is a possibility to attach comments to a commit. I am planning 
to review your commits soon.

Thanks a lot for your contributions so far :). Meanwhile I am working a bit on 
getting the gtk2 interface up and running.

Original comment by boukewou...@gmail.com on 14 Nov 2011 at 11:01

rlishtaba / rubyripper

Rubyripper should use absolute offsets for the end of a rip #490