njh / twolame

MPEG Audio Layer 2 (MP2) encoder
http://www.twolame.org/
GNU Lesser General Public License v2.1
59 stars 34 forks source link

Silence encoding #58

Closed eblanca closed 7 years ago

eblanca commented 7 years ago

When encoding floating point input WAV files, twolame creates a bitstream with silence. The current libsndfile setting lacks the proper scaling value, needed in those cases. The result is, any value read from such files is imported with integer values in [-1, +1]. Lame has the scaling enabled, even thought the current behavior for libsndfile is auto-scaling floating point input sources, by default. This means changing the input (which is not always welcome) but, on the other hand, makes the encoding engine able to create useful files. I will submit a patch for this asap.

eblanca commented 7 years ago

Created #59

njh commented 7 years ago

Can you suggest how to test this?

Is it possible to create a suitable test WAV file using sox?

eblanca commented 7 years ago

Of course you can with sox. Ensure you pass -e float -b 32 for the output file format and it should do the job. After this, you can check the file format with soxi. Also, mplayer decodes many formats into floating point wav, by default (vorbis and m4a among others, IIRC).

njh commented 7 years ago

Confirmed through manual testing:

~/Projects/twolame(master) $ cd tests
~/Projects/twolame/tests(master) $ sox testcase-22050.wav -e float -b 32 testcase-float32.wav
~/Projects/twolame/tests(master) $ ../frontend/twolame testcase-float32.wav testcase-float32.mp2
---------------------------------------------------------
Input Filename: testcase-float32.wav
Output Filename: testcase-float32.mp2
Input Format: WAV (Microsoft), 32 bit float
Input Duration: 0min 0.0sec
Input Library: libsndfile-1.0.28
---------------------------------------------------------
LibTwoLame 0.4.0 (http://www.twolame.org/)
Input : 22050 Hz, 2 channels
Output: 22050 Hz, Stereo
96 kbps CBR MPEG-2 LSF Layer II psycho model=3 
[De-emph:Off     Copyright:No     Original:Yes]
[Padding:Off     CRC:Off          Energy:Off  ]
---------------------------------------------------------
Encoding frame: 11/11 (100%)
Encoding Finished.
Total bytes written: 6.72 KB.
~/Projects/twolame/tests(master) $ sox testcase-float32.mp2 -n stat
Samples read:             23020
Length (seconds):      0.521995
Scaled by:         2147483647.0
Maximum amplitude:     0.000030
Minimum amplitude:    -0.000036
Midline amplitude:    -0.000003
Mean    norm:          0.000000
Mean    amplitude:    -0.000000
RMS     amplitude:     0.000001
Maximum delta:         0.000039
Minimum delta:         0.000000
Mean    delta:         0.000000
RMS     delta:         0.000001
Rough   frequency:         4700
Volume adjustment:    27622.500
njh commented 7 years ago

Confirmed fixed with manual testing:

~/Projects/twolame(master) $ cd tests
~/Projects/twolame/tests(master) $ sox testcase-22050.wav -e float -b 32 testcase-float32.wav
~/Projects/twolame/tests(master) $ ../frontend/twolame testcase-float32.wav testcase-float32.mp2
---------------------------------------------------------
Input Filename: testcase-float32.wav
Output Filename: testcase-float32.mp2
Input Format: WAV (Microsoft), 32 bit float
Input Duration: 0min 0.0sec
Input Library: libsndfile-1.0.28
---------------------------------------------------------
LibTwoLame 0.4.0 (http://www.twolame.org/)
Input : 22050 Hz, 2 channels
Output: 22050 Hz, Stereo
96 kbps CBR MPEG-2 LSF Layer II psycho model=3 
[De-emph:Off     Copyright:No     Original:Yes]
[Padding:Off     CRC:Off          Energy:Off  ]
---------------------------------------------------------
Encoding frame: 11/11 (100%)
Encoding Finished.
Total bytes written: 6.72 KB.
~/Projects/twolame/tests(master) $ sox testcase-float32.mp2 -n stat
Samples read:             23020
Length (seconds):      0.521995
Scaled by:         2147483647.0
Maximum amplitude:     0.916997
Minimum amplitude:    -0.978987
Midline amplitude:    -0.030995
Mean    norm:          0.142265
Mean    amplitude:    -0.005834
RMS     amplitude:     0.188628
Maximum delta:         0.905128
Minimum delta:         0.000000
Mean    delta:         0.134944
RMS     delta:         0.175963
Rough   frequency:         3273
Volume adjustment:        1.021
njh commented 7 years ago

Ugh, I think I broke the build by adding a test for this (477393c9aad60b2b0db282ce7a65d56061f2e655): https://travis-ci.org/njh/twolame/builds/274356625

I suspect this is due to differing floating point conversions or perhaps dithering.

It would be good to have a test for this but may have to use something other than a MD5 to check the result.