jniemann66 / ReSampler

High quality command-line audio sample rate converter
GNU Lesser General Public License v2.1
160 stars 25 forks source link
audio-applications audio-conversion bit-depth conversion fir-filter frequency-response lowpass-filter resample sample-rate sample-rate-conversion

Synopsis

ReSampler is a high-performance command-line audio sample rate conversion tool which can convert audio file formats with a variety of different bit-depths and audio channel configurations.

ReSampler compiles and runs on Windows, Linux and macOS

ReSampler is intended to produce outstanding quality sound files, keeping aliasing and other unwanted artifacts to a minimum, as the following actual measurement graphs show:

FIR Filter response - downsample 96k->44k
Typical frequency response when downsampling from 96kHz to 44.1kHz with standard (default) LPF

spectrogram: 0-48khz sweep
Spectrogram of 0-48kHz Sine Sweep @96kHz sample rate, after having been downsampled to 44kHz sample rate

Motivation

Design Philosophy

Usage

from the command line, the main options are as follows:

resampler.exe -i inputfile [-o outputfile] -r samplerate [-b bitformat] [-n [<normalization factor>]]

samplerate : target sample rate in Hz.

bitformat : bit representation (sub format) of the data in the output file. If this option is omitted, resampler will try to deduce the intended bit format automatically. Not all bit formats are valid for a given output file type. For more details, refer to the libsndfile documentation. Here is a list of all subformats:

8           8-bit (signed or unsigned - automatic, based on file type)
s8          Signed 8 bit data
16          Signed 16 bit data
24          Signed 24 bit data
32          Signed 32 bit data
u8          Unsigned 8 bit data
32f         32 bit float data
64f         64 bit float data
ulaw        mu-Law encoded
alaw        A-Law encoded
ima-adpcm   IMA ADPCM
ms-adpcm    Microsoft ADPCM
gsm610      GSM 6.10 encoding
vox-adpcm   OKI Dialogix ADPCM
g721-32     32kbs G721 ADPCM encoding
g723-24     24kbs G723 ADPCM encoding
g723-40     40kbs G723 ADPCM encoding
dwvw12      12 bit Delta Width Variable Word encoding
dwvw16      16 bit Delta Width Variable Word encoding
dwvw24      24 bit Delta Width Variable Word encoding
dwvwn       N bit Delta Width Variable Word encoding
dpcm8       8 bit differential PCM (XI only)
dpcm16      16 bit differential PCM (XI only)
vorbis      Xiph Vorbis encoding
alac16      Apple Lossless Audio Codec (16 bit)
alac20      Apple Lossless Audio Codec (20 bit)
alac24      Apple Lossless Audio Codec (24 bit)
alac32      Apple Lossless Audio Codec (32 bit)

Note: the --listsubformats option will cause the program to display the valid formats for a given file-type

Normalization factor : value between 0.0 and 1.0, with 1.0 (equivalent to 100 percent) producing the largest possible output level without clipping. Note: ReSampler will accept normalization values over 1.0, but this will certainly result in clipping, and is therefore only for experimental and testing purposes. Just using -n with no parameter is equivalent to -n 1.0

Additional options:

Note: as of version 2.0, command-line options are more "forgiving" - they are now case insensitive, and allow hyphens within the text of the option to be omitted. (However, the hyphens preceding the option are still required). This allows for variations such as the following to be possible: --steep-lpf --steeplpf --steep-LPF etc

--help : show usage and list additional commandline options

--version : display the version number of the program

--compiler : display compiler used to build the app

--sndfile-version : display the version of libsndfile library

--listsubformats <filetype> : list all valid subformats for a given filetype

--showDitherProfiles : show a list of all available dither profiles.

--gain <amount> : adjust the gain (amplification factor). 1.0 = unity gain (no amplification), -1.0 = invert signal, 0 = silence. Note: if clipping protection is enabled, gain will be automatically re-adjusted after the first pass if clipping occurs.

Note: Setting the gain differs from applying normalization in that normalization is a type of automatic gain control, which sets the gain to whatever it needs to be to achieve the requested output level.

--doubleprecision : force ReSampler to use double-precision (64-bit floating point) arithmetic for its internal calculations.

--dither [<amount>] : generate +/-amount bits of dither. Dithering deliberately adds a small amount of a particular type of noise (triangular pdf with noise-shaping) prior to quantization to the output file. The goal of dithering is to reduce distortion, and allow extremely quiet passages to be preserved when they would otherwise be below the threshold of the target bit depth. Usually, it only makes sense to add dither when you are converting to a lower bit depth, for example:

The effect of dithering is most noticeable during extremely quiet passages (typically, in fade-outs) of the audio. If you can hear modulation effects, or "tearing" in the quietest passages of your output file, then a greater amount of dither may need to be applied. (note: in many cases, these passages are so quiet, you will need to normalize them just to hear them).

The amount parameter represents the number of bits of dither noise to be generated (prior to the noise-shaping process). The actual level of dither noise is equal to +/- 2^(amount-1) steps. The default for amount is 1.0, and it doesn't need to be an integer. Values in the range 1-6 are sensible for most situations. The noise-shaping curve may further increase the amplitude of the dither, depending on the "intensity" (overall gain) of the chosen noise-shaping curve.

--autoblank : when specified in conjunction with --dither , causes dithering to switch-off after 30,000 consecutive input samples of silence (< -193dB is considered silence). Dithering is re-enabled immediately upon a non-zero input sample being detected.

--ns <n> : select dither profile from 0-13. Generally speaking, as the dither profile number increases, the noise-shaping curve gets progressively more "intense" (higher amplitude), with the exception of profle 13 (blue noise) which is actually the most gentle profile. Dither profile 0 is completely flat (no noise shaping), and is equivalent to --flat-tpdf. The default dither profile (if no profile is specified) is #6 (standard), which has a moderate noise-shaping curve.

--flat-tpdf : when specified in conjunction with --dither , causes the dithering to use flat tpdf noise with no noise-shaping.

--seed <n> : when specified in conjunction with --dither , causes the pseudo-random number generator used to generate dither noise to generate a specific (repeatable) sequence of noise associated with the number n. Using the same value of n on subsequent conversions should reproduce precisely the same result. n is a signed integer in the range -2,147,483,648 through 2,147,483,647.

--quantize-bits <number of bits> : when used in conjunction with --dither, quantize the output to a specified number of bits. (eg. quantize to 8 bits when output is really 16 bits)

--noDelayTrim : deactivate correction of delay due to the linear-phase FIR anti-aliasing filter. By default, delay correction is active to ensure that the timing of the converted output file matches the timing of the input file precisely.

--minphase : use a minimum-phase FIR filter, instead of Linear-Phase

--flacCompression <compressionlevel> : set the compression level for flac output files (between 0 and 8)

--vorbisQuality <quality> : set the quality level for ogg vorbis output files (between -1 and 10)

--noClippingProtection : disable clipping protection (clipping protection is normally active by default)

--relaxedLPF : cause the lowpass filter to use a "late" cutoff frequency with regular (hence "relaxed") steepness. (cutoff = 95.45% of Nyquist, transition width = 9.09% of Nyquist). This will (theoretically) allow a small amount of aliasing, but at the same time, keep ringing to a minimum and maintain a good frequency response.

--steepLPF : cause the lowpass filter to use a steeper cutoff (cutoff = 95.45% of Nyquist, transition width = 4.55% of Nyquist). It avoids aliasing, but may result in more ringing.

--lpf-cutoff <percentage> : set a custom cutoff frequency for the lowpass filter, expressed as a percentage of Nyquist Frequency. Allowable range: 1.0 - 99.9. Overrides --relaxedLPF and --steepLPF

--lpf-transition <percentage> : when used in conjunction with --lpf-cutoff, set the transition width of the lowpass filter, expressed as a percentage of Nyquist frequency.

--mt : Multi-Threading - process each channel in a separate thread. On a multi-core system, this makes better use of available CPU resources and results in a significant speed improvement.

--rf64 : force output .wav file to be in rf64 format. Has no effect if output file is not a .wav file.

Note: If your output file has an .rf64 extension, it will automatically be in rf64 format

--noPeakChunk : suppress output of PEAK chunk in floating-point formats

--noMetadata : prevent copying of metadata from input file to output file.

By default, ReSampler will attempt to copy native metadata from the input file to the output file, provided the input and output file types support metadata (ie: wav, aiff, caf, flac, oga, rf64)

--singleStage : use single-stage conversion engine (significantly less efficient and therefore slower, but "simpler" conversion)

--multiStage : use multi-stage conversion engine

--showStages : show details about the parameters used for each conversion stage.

--showTempFile : (Windows Only) show the path and filename of the temp file

--tempDir <path> : (Windows Only) specify temp directory for the temp file, instead of the default (%temp%). Directory must already exist.

--noTempFile : disable the creation of a temporary file during conversion. (By default, a temp file is created, containing intermediate conversion results in floating-point format. The temp file is used to facilitate fast gain adjustment when clipping is detected, and is deleted after the output file has been written. If the creation of a temp file is disabled, the entire conversion will need to be performed again if clipping is detected. Note: versions of Resampler prior to 2.0.3 did not use a temporary file)

--raw-input <samplerate> <bit-format> [number of channels] : read raw input data (ie with no header). Since there is no header, you must specify the sample rate, bit format, and number of channels of the input file using the syntax above. If the number of channels is omitted, single-channel (mono) input is assumed. Accepted bit formats for raw input are: 8, s8, u8, 16, 24, 32, 32f, 64f, alaw, ulaw, gsm610, dwvw12, dwvw16, dwvw24, vox-adpcm

--progress-updates <0..100> : number of progress update notifications to be sent by the converter throughout the conversion. (0 = no updates, 100 = every 1% etc). Default is 10

--demodulateIQ [<AM|LSB|USB|NFM|WFM>] : (since 2.1.0) demodulate an I/Q signal using the specified modulation scheme. If no modulation scheme is specified, default to NFM (Narrowband FM). The I (In-phase) and Q (Quadrature) components are expected to be in channels 0 and 1 respectively. Output file will contain a single channel, except in the case of WFM (Wideband FM), which will be in stereo.

--deemphasis <NONE|50|75> : (since 2.1.0) when used in conjunction with WFM demodulation, set the de-emphasis time constant to 50 microseconds (most of the World) or 75 microseconds (USA & South Korea), or "NONE" for no de-emphasis. If omitted, default is 50.

--stereo-width <width> : (since 2.1.0) set the width of the stereo image. (Alters the amount of "side" relative to the amount of "mid/mono"). 0: mono, less than 1.0: narrow stereo, 1.0: normal stereo, more than 1.0: wide stereo. Negative values swap left and right channels.

--fade-in <seconds> : (since 2.1.0) add a smooth fade-in at the start of the audio

--fade-out <seconds> : (since 2.1.0) add a smooth fade-out at the end of the audio

Example

To convert a 96kHz 24-bit .wav input file to 44.1kHz 16-bit .flac output file, with steep lowpass filter and dithering:

ReSampler.exe -i c:\pathto\somefile.wav -o c:\pathto\convertedfile.flac -r 44100 -b 16 --steepLPF --dither

note: command-line options can be supplied in any order. The -i inputfile and -r samplerate options are mandatory in order to perform a conversion.

Supported Formats

Resampler can handle any of the file formats libsndfile can handle, plus a few extras (notably, the 1-bit DSD formats, dff and dsf). Thus, the following file extensions are supported:
Supported Formats

For more information, please refer to the libsndfile documentation

(since v2.0.7) ReSampler can also export audio data to a csv file. The data for each channel is placed in a separate column, and each row represents a sample. To do this, set the file extension of the output file to .csv and specify an output format with the -b option followed by a bit format specification of the form [u|s]<num of bits>[f|i|o|x] where u = unsigned, s = signed, f = float, i = integer, o = octal, x = hexadecimal. (if the -b option is omitted, then the default will be 16 bit signed integer)

Additional Information

Clipping Protection

Resampler employs a multiple-pass approach with regards to clipping detection. If clipping is detected (ie normalized signal level exceeded +/- 1.0) during processing, it will re-do the conversion with the overall gain adjusted appropriately to avoid clipping. (This can be disabled with the --noClippingProtection option)

Double Precision vs Single Precision

When Double Precision is engaged (using the --doubleprecision option), all calculations inside the conversion will be done using double-precision (64-bit) floating-point instead of single-precision (32-bit). Typically, the most noticeable effect of this is that the noise floor is significantly reduced. This can be observed in spectrograms of frequency sweeps, which were converted from 96khz to 44.1khz:

Single Precision: Double Precision:
96khz sweep to 44.1khz - single precision 96khz sweep to 44.1khz - double precision

In the above comparison, the "blue haze" seen in the single-precision conversion is the result of quantization noise from rounding errors in single-precision arithmetic. The double-precision conversion is noticeably cleaner. Please note, however, that even with single precision, the noise is sitting about 150dB down, and will not be reproduced by any real-world DAC, and therefore will not be audible.

There is a penalty in processing speed for converting using double precision, and it is to be expected that double-precision conversions take longer to process.

Conversion to same sampling rate

When the target sampling rate is the same as the input file (ie 1:1 ratio), sample-rate conversion is not actually performed. However, bit-depth / file format conversion and other features such as dithering and normalization are performed when requested.

Automatic promotion of wav files to rf64 format

For wav file output, if the data contained in the output file will exceed 4 Gigabytes after conversion, ReSampler will automatically "promote" the file format to rf64

Traditional wav files only have 32-bits to quantify the size of the data they contain, which means that the data size cannot exceed 4 Gigabytes (2^32 bytes). The rf64 specification was designed to extend the wav format to remove this limitation.

Alternatively, other output formats suitable for large output files (exceeding 4GB) are: w64, caf, au or compressed formats such as flac or oga

Note: the --rf64 option will force output .wav files to be in rf64 format.

Description of code

Sample Rate Conversion is accomplished using the usual interpolation/decimation techniques. When using complex conversion ratios (such as 44.1k <--> 48k), a larger FIR lowpass filter is used to ensure a clean conversion.

Resampler uses the C++ wrapper of the libsndfile library for sound file I/O operations.

The FIR filter class written for this project uses SSE SIMD instructions when using single-precision floating-point to perform 4 multiply/accumulate operations simultaneously. This has been found to yield a speed improvement of approximately 3x. Some experimentation was also done with 2x double-precision SSE2 SIMD, but was found to be no faster than the basic scalar implementation, and has thus been commented-out (but not deleted) from the source.

(Additionally, some progress has been made recently with a build of the project using AVX instructions, on supported CPUs / OSes, to perform 8 single-precision or 4 double-precision multiply/accumulate operations at a time.)

Resampler was originally developed on Visual C++ 2015, but also compiles just as well on gcc and clang.

explanation of source code files:


resampler.cpp : core of program

resampler.h : important data structures and function declarations

FIRFilter.h : FIR Filter DSP code

FIRFilterAVX.h : AVX-specific DSP code (conditional #include in AVX build)

fraction.h : defines Fraction type, and functions for obtaining gcd, simplified fractions, and prime factors of integers

srconvert.h : the heart of the sample rate conversion process

biquad.h : IIR Filter (used in dithering)

ditherer.h : defines ditherer class, for adding dither

noiseshape.h : contains definitions of noise-shaping curves

dff.h : module for reading dff files

dsf.h : module for reading dsf files

alignedmalloc.h : simple function for dynamically allocating aligned memory (AVX requires 32-byte alignment)

osspecific.h : contains macro definitions for specific target operating systems

raiitimer.h : simple timer which displays elapsed time upon going out of scope

(the class implementations are header-only)


compiling ReSampler

Windows

Linux

Mac

Raspbian


Description of Binaries included in distribution

ReSampler/Release/ReSampler.exe : 32-bit Windows with SSE2 instruction set

ReSampler/x64/Release/ReSampler.exe : 64-bit Windows (uses SSE2, but all 64-bit CPUs should have this)

ReSampler/x64/AVX_Release/ReSampler.exe : 64-bit Windows with AVX instruction set (Requires Intel Sandy Bridge CPU or higher, AMD Bulldozer or higher. Requires Windows 7 SP1 or higher, Windows Server 2008 R2 SP1 or higher OS)

Resampler/x64/mingGW-W64 : 64-bit windows version, compiled in mingw-w64 gcc compiler (runs about 20% faster !)

Resampler/x64/mingGW-W64-AVX : 64-bit AVX windows version, compiled in mingw-w64 gcc compiler (runs about 20% faster !)


Acknowledgements

The following libraries are used in ReSampler:

Name Description Reason For Inclusion Link License
libsndfile C library for reading and writing files containing sampled sound Core read/write capabilities for common file formats libsndfile LGPL 2.1
FFTW C subroutine library for computing the discrete Fourier transform (DFT) used for conversion of linear-phase impulses to minimum-phase fftw GPL (2?)
CTPL Modern and efficient C++ Thread Pool Library Used for multi-threaded sample-rate conversion CTPL Apache 2.0

Thanks to the following github users for their contributions:

@mgood7123

@godsic