tidalcycles / Dirt-Samples

Set of samples used in Dirt
147 stars 43 forks source link

WAV-within-WAV duplicate header glitch #10

Closed claudeha closed 5 years ago

claudeha commented 5 years ago

These samples (only tried a couple of bd and sn so far, so it might not be all of them...) seem to have a double WAV header. These leads to a ~22 sample (for mono 16bit WAV) noisy click at the start of the sound as the duplicated header does not make audio sense.

$ hd sn/ST0T0S0.wav | head -n 6
00000000  52 49 46 46 9e 3d 00 00  57 41 56 45 66 6d 74 20  |RIFF.=..WAVEfmt |
00000010  10 00 00 00 01 00 01 00  44 ac 00 00 88 58 01 00  |........D....X..|
00000020  02 00 10 00 64 61 74 61  7a 3d 00 00 52 49 46 46  |....dataz=..RIFF|
00000030  72 3d 00 00 57 41 56 45  66 6d 74 20 10 00 00 00  |r=..WAVEfmt ....|
00000040  01 00 01 00 44 ac 00 00  88 58 01 00 02 00 10 00  |....D....X......|
00000050  64 61 74 61 4e 3d 00 00  52 49 d4 46 49 40 bb 18  |dataN=..RI.FI@..|

I hackily fixed my local copy by:

cat sample.wav | tail -c +45 > tmp.wav && mv -f tmp.wav sample.wav

but this assumes the outer WAV header is exactly 44 bytes and that the remainder of the file is audio data with a valid embedded WAV header of its own.

yaxu commented 5 years ago

Ah, well spotted! I ran a script:

for i in */*wav
do
    n=$(hd -n 100 "$i" | grep -c RIFF)
    if [[ $n -gt 1 ]]; then
    echo $i : $n
    fi
done

which claims these have problems:

bd/BT0A0A7.wav : 2
bd/BT0A0D0.wav : 2
bd/BT0A0D3.wav : 2
co/CLOP1.wav : 2
co/CLOP2.wav : 2
co/CLOP3.wav : 2
co/CLOP4.wav : 2
ht/HT0D0.wav : 2
ht/HT0D3.wav : 2
ht/HT0D7.wav : 2
ht/HT0DA.wav : 2
ht/HT3D0.wav : 2
ht/HT3D3.wav : 2
ht/HT3D7.wav : 2
ht/HT3DA.wav : 2
ht/HT7D0.wav : 2
ht/HT7D3.wav : 2
ht/HT7D7.wav : 2
ht/HT7DA.wav : 2
ht/HTAD0.wav : 2
ht/HTAD3.wav : 2
ht/HTAD7.wav : 2
ht/HTADA.wav : 2
lt/LT0D0.wav : 2
lt/LT0D3.wav : 2
lt/LT0D7.wav : 2
lt/LT0DA.wav : 2
lt/LT3D0.wav : 2
lt/LT3D3.wav : 2
lt/LT3D7.wav : 2
lt/LT3DA.wav : 2
lt/LT7D0.wav : 2
lt/LT7D3.wav : 2
lt/LT7D7.wav : 2
lt/LT7DA.wav : 2
lt/LTAD0.wav : 2
lt/LTAD3.wav : 2
lt/LTAD7.wav : 2
lt/LTADA.wav : 2
mt/MT0D0.wav : 2
mt/MT0D3.wav : 2
mt/MT0D7.wav : 2
mt/MT0DA.wav : 2
mt/MT3D0.wav : 2
mt/MT3D3.wav : 2
mt/MT3D7.wav : 2
mt/MT3DA.wav : 2
mt/MT7D0.wav : 2
mt/MT7D3.wav : 2
mt/MT7D7.wav : 2
mt/MT7DA.wav : 2
mt/MTAD0.wav : 2
mt/MTAD3.wav : 2
mt/MTAD7.wav : 2
mt/MTADA.wav : 2
oc/OPCL1.wav : 2
oc/OPCL2.wav : 2
oc/OPCL3.wav : 2
oc/OPCL4.wav : 2
rm/RIM0.wav : 2
rm/RIMA.wav : 2
sn/ST0T0S0.wav : 2
sn/ST0T0S3.wav : 2
sn/ST0T0S7.wav : 2
sn/ST0T0SA.wav : 2
sn/ST0T3S3.wav : 2
sn/ST0T3S7.wav : 2
sn/ST0T3SA.wav : 3
sn/ST0T7S3.wav : 2
sn/ST0T7S7.wav : 2
sn/ST0T7SA.wav : 2
sn/ST0TAS3.wav : 2
sn/ST0TAS7.wav : 2
sn/ST0TASA.wav : 2
sn/ST3T0S0.wav : 2
sn/ST3T0S3.wav : 2
sn/ST3T0S7.wav : 2
sn/ST3T0SA.wav : 2
sn/ST3T3S3.wav : 2
sn/ST3T3S7.wav : 2
sn/ST3T3SA.wav : 2
sn/ST3T7S3.wav : 2
sn/ST3T7S7.wav : 2
sn/ST3T7SA.wav : 2
sn/ST3TAS3.wav : 2
sn/ST3TAS7.wav : 2
sn/ST3TASA.wav : 2
sn/ST7T0S0.wav : 2
sn/ST7T0S3.wav : 2
sn/ST7T0S7.wav : 2
sn/ST7T0SA.wav : 2
sn/ST7T3S3.wav : 2
sn/ST7T3S7.wav : 2
sn/ST7T3SA.wav : 2
sn/ST7T7S3.wav : 2
sn/ST7T7S7.wav : 2
sn/ST7T7SA.wav : 2
sn/ST7TAS3.wav : 2
sn/ST7TAS7.wav : 2
sn/ST7TASA.wav : 2
sn/STAT0S0.wav : 2
sn/STAT0S3.wav : 2
sn/STAT0S7.wav : 2
sn/STAT0SA.wav : 2
sn/STAT3S3.wav : 2
sn/STAT3S7.wav : 2
sn/STAT3SA.wav : 2
sn/STAT7S3.wav : 2
sn/STAT7S7.wav : 2
sn/STAT7SA.wav : 2
sn/STATAS3.wav : 2
sn/STATAS7.wav : 2
sn/STATASA.wav : 2
world/bd.wav : 2
world/sn.wav : 2
yaxu commented 5 years ago

Adding egrep -abo "RIFF" $i |sed -n 2p to this it looks like the inner headers all start at byte 44, so looks like it's safe to run your script on them all. Will try that..

yaxu commented 5 years ago

BTW the fork here is more up to date than this one (I'll do a PR to sync them): https://github.com/musikinformatik/Dirt-Samples

yaxu commented 5 years ago

Ok fixed here: https://github.com/tidalcycles/Dirt-Samples/commit/a336aac3d15b3e6e12a42cf2461313e9a866ac07

I also needed to fix the length of the file in the header, the final script is below. It didn't work on rm/RIM0.wav for some reason, so I've left that as-is.

It's going to be very strange having non-clicky bd samples..

for i in */*wav
do
    n=$(hd -n 100 "$i" | grep -c RIFF)
    if [[ $n -gt 1 ]]; then
    echo $i : $n
    egrep  -abo "RIFF" $i |sed -n 2p
    cat $i | tail -c +45 > tmp.wav && mv -f tmp.wav $i
        sox --ignore-length $i tmp.wav && mv -f tmp.wav $i
    play $i
    fi
done