Feature request - rip only mismatching chunks for retries

GoogleCodeExporter commented 9 years ago

Brought this up a long time ago, but as long as I'm around might as well
officially add it.

After the two initial rips and comparisons rr knows the location (both time
and sector?) of the mismatches.  Currently, rr rips the whole track again
and compares the whole thing.  This works fine but it could be sped up
greatly if rr tells cdparanoia to rip just the mismatching section, then
compares.

Nothing would change for the first and second rips.  However if, after the
second rip & comparison there are mismatching chunks, rr should md5sum the
mismatched chunk(s)|sector(s).  On the third (and higher) rip attempts rr
takes the location of the mismatch and uses the appropriate cdparanoia
command to rip only that piece of the track.  From the cdparanoia man page:

extract from track 1, time 0:30.12 to 1:10.00:
       cdparanoia "1[:30.12]-1[1:10]"

the ripped portion would be md5summed and checked against the sums of the
mismatching chunk until the specified # of matches is reached.

Make sense?

Caveat #1 - would be wise to md5 and rip not only the mismatching portion
but the chunk/sector adjacent (on either side) just out of caution.  It
wouldn't slow things down much anyway.

Caveat #2 - on tracks that are really fubar this wouldn't help much and
might actually slow it down if the drive is seeking & spinning up and down
over the whole track.  Maybe if the errors exceed a certain threshold just
rip the whole track again.

Original issue reported on code.google.com by mordbr...@gmail.com on 5 Aug 2008 at 1:40

GoogleCodeExporter commented 9 years ago

Request accepted. This especially makes sense now we allow image ripping. But
implementing this is quite complicated IMHO. Let me add three other things that
touches this...

Caveat #3 - Rubyripper needs to add the ability of creating and stripping wav
containers. Each wav file that is outputted by cdparanoia has a little wav 
container
overhead. When comparing sectors these have to be stripped ofcourse.

Caveat #4 - The ripped chunks need to be large enough to circumvent the drive 
cache
problem. When rereading we do want the drive to read the audio disc, not it's 
cache.
Does anyone know how big the cache of a cdrom drive can get?

Caveat #5 - Not only do we need to be able to rip a certain amount of chunks 
from the
mismatches on, but we need to account for the problem that we're at the end of 
the
audio disc. This also happens the other way around ofcourse for the start of the
disc.

Original comment by rubyripp...@gmail.com on 5 Aug 2008 at 7:49

Changed state: Accepted
Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

#3.  Forgot to mention the intricacies with splicing wav files (and dealing with
headers).  One option is to have cdparanoia output in a different format:

 -p --output-raw                 : output raw 16-bit PCM in host byte 
                                    order
  -r --output-raw-little-endian   : output raw 16-bit little-endian PCM
  -R --output-raw-big-endian      : output raw 16-bit big-endian PCM
  -w --output-wav                 : output as WAV file (default)
  -f --output-aiff                : output as AIFF file
  -a --output-aifc                : output as AIFF-C file

I'm not sure if any of the encoders can handle raw PCM so it might be pointless 
if
the have to be turned into wav anyway.

#4.  Drive cache is a terrible thing for ripping!  Don't some of them have 2 MB 
buffers? 

#5.  cdparanoia, if not rr, will know where the tracks start and end.  
Shouldn't be
too difficult to code (I hope)... but caveat #4 makes this harder than it needs 
to
be.  What do other rippers do about drive cache?  Surely EAC works with / 
around it,
right?

Original comment by mordbr...@gmail.com on 5 Aug 2008 at 9:05

GoogleCodeExporter commented 9 years ago

cdparanoia can output to raw PCM:
 -p --output-raw                 : output raw 16-bit PCM in host byte 
                                    order
  -r --output-raw-little-endian   : output raw 16-bit little-endian PCM
  -R --output-raw-big-endian      : output raw 16-bit big-endian PCM

FLAC can handle raw input with the --force-raw-format option.  

oggenc:
 -r, --raw            Raw mode. Input files are read directly as PCM data
 -B, --raw-bits=n     Set bits/sample for raw input. Default is 16
 -C, --raw-chan=n     Set number of channels for raw input. Default is 2
 -R, --raw-rate=n     Set samples/sec for raw input. Default is 44100

lame: 
 Input options for RAW PCM:
    -r              input is raw pcm
    -x              force byte-swapping of input
    -s sfreq        sampling frequency of input file (kHz) - default 44.1 kHz
    --bitwidth w    input bit width is w (default 16)
    --signed        input is signed (default)
    --unsigned      input is unsigned
    --little-endian input is little-endian (default)
    --big-endian    input is big-endian

So seems like there is no problem with working with raw PCM.  I imagine that 
ripping
only errant chunks could be an option and, if checked, it also works with PCM 
instead
of wav.

Original comment by mordbr...@gmail.com on 5 Aug 2008 at 10:36

GoogleCodeExporter commented 9 years ago

The problem of the wav header might not be that big. We just have to skip 
reading and
writing the first 44 bytes of each wav file. The creation of wav headers is only
needed when we're not reading at least one track completely first. But looking 
again
at your request, this is not what you're asking.

Original comment by rubyripp...@gmail.com on 6 Aug 2008 at 9:45

GoogleCodeExporter commented 9 years ago

44 bytes isn't bad to strip.  Just pointing out that it is possible to work 
entirely
with PCM.  Since you're the one coding the choice is entirely yours, naturally.

Original comment by mordbr...@gmail.com on 7 Aug 2008 at 4:39

hollunder / rubyripper

Feature request - rip only mismatching chunks for retries #220