Open shdevin opened 9 years ago
Hi,
Thanks for using LibXtract and reporting this issue.
LibXtract already includes a check to ensure that log(0)
isn't computed in xtract_spectrum()
, i.e.
if ((temp = XTRACT_SQ(real) + XTRACT_SQ(imag)) >
XTRACT_LOG_LIMIT)
temp = log(temp / NxN);
else
temp = XTRACT_LOG_LIMIT_DB;
Where XTRACT_LOG_LIMIT
is defined as 2e-42
in xtract_macros_private.h
.
Actually, I think 2e-42 was set when LibXtract was still single-precision, and is certainly a lot higher than the smallest normal double.
So... before we start adding epsilon values, we need to look at why you're getting inf
values in your code. A good starting point would be stepping through with the debugger.
Can you post an example of your source code (and input), which reproduces the issue?
Thanks,
Jamie
I see. I automatically assumed that the log function was the source of the problem. I've been trying to find the error, but didn't reach any conclusion. I checked XTRACT_LOG_LIMIT and my compiler also recognizes 2e-42 as a non-zero double. I also looked at my own data array, which also seems to be okay. Whatever this is, it sometimes happens and sometimes doesn't. Replacing the last entry of the array (i.e. spectrum[BLOCKSIZE/2] = 0.0) seemed to fix it for me, but that's obviously isn't a permanent solution.
Thanks, Navid
Would you mind sharing a minimum working code fragment that reproduces the problem? Then I can have a go at debugging it.
Thanks,
Jamie
This is the feature extraction code. As you can see, it's pretty much the same as simpletest.cpp, except that I read files from a list of files in the following format: wav-id /path-to-wav/wavname.wav The thing is that the problem only happens when I try to extract features for chunks of files (not one file at a time). That's what makes the bug difficult to locate.
#include <iostream>
#include "xtract/libxtract.h"
#include "xtract/xtract_stateful.h"
#include "xtract/xtract_scalar.h"
#include "xtract/xtract_helper.h"
#include <math.h>
#include <fstream>
#include "WaveFile.h"
#define BLOCKSIZE 512
#define MAVG_COUNT 10
#define HALF_BLOCKSIZE 256
#define SAMPLERATE 16000
#define PERIOD 102
#define MFCC_FREQ_BANDS 14
#define MFCC_FREQ_MIN 20
#define MFCC_FREQ_MAX 8000
int main(){
// Initialize mel filters and Windows:
double argd[4] = {0}; // argument array (needed for feature/spectrum estimation)
double spectrum[BLOCKSIZE] = {0};
double mfccs[MFCC_FREQ_BANDS] = {0};
//Allocate Mel filters
xtract_mel_filter mel_filters;
mel_filters.n_filters = MFCC_FREQ_BANDS;
mel_filters.filters = (double **)malloc(MFCC_FREQ_BANDS * sizeof(double *));
for(uint8_t k = 0; k < MFCC_FREQ_BANDS; ++k)
{
mel_filters.filters[k] = (double *)malloc(BLOCKSIZE * sizeof(double));
}
// Initialize mfcc features.
xtract_init_mfcc(BLOCKSIZE >> 1, SAMPLERATE >> 1, XTRACT_EQUAL_GAIN, MFCC_FREQ_MIN, MFCC_FREQ_MAX, mel_filters.n_filters, mel_filters.filters);
// create the window functions
double *window = NULL;
double windowed[BLOCKSIZE] = {0};
window = xtract_init_window(BLOCKSIZE, XTRACT_HANN);
// End of initialization
// Read list of files:
std::ifstream wavList;
wavList.open("path/to/file/wav_test.txt");
std::string line_vals;
std::string delimiter = " ";
int nFiles = 0;
while (std::getline(wavList,line_vals)) {
nFiles++;
// RegEx to separate utt-ID from utt-path.
std::size_t spaceIdx = line_vals.find(delimiter);
std::string wavId = line_vals.substr(0,spaceIdx);
std::string wavPath = line_vals.substr(spaceIdx+1);
// Read audio data from file
WaveFile wavFile(wavPath);
int16_t *wavData = (int16_t *)wavFile.GetData(); // assume 16bit signed integer
std::size_t wavBytes = wavFile.GetDataSize();
uint64_t wavSamples = wavBytes / sizeof(int16_t);
double data[wavSamples];
std::cout << wavSamples << std::endl;
// std::cout << nFiles << std::endl;
for (int n = 0; n < wavSamples; ++n){
data[n] = (double)wavData[n]/(65535);// This value fixes it. Found it off the internet.
}
for (uint64_t n = 0; (n + BLOCKSIZE) < wavSamples; n += HALF_BLOCKSIZE){ // frame and overlap
xtract_windowed(&data[n], BLOCKSIZE, window, windowed);
/* get the spectrum */
argd[0] = SAMPLERATE / (double)BLOCKSIZE;
argd[1] = XTRACT_MAGNITUDE_SPECTRUM;
argd[2] = 0.f; /* DC component - we expect this to zero for square wave */
argd[3] = 0.f; /* No Normalisation */
xtract_init_fft(BLOCKSIZE, XTRACT_SPECTRUM);
xtract[XTRACT_SPECTRUM](windowed, BLOCKSIZE, &argd[0], spectrum);
xtract_free_fft();
/* THIS IS WHERE I MAKE THE CORRECTION TO THE SPECTRUM */
spectrum[255] = 0.0;
/* compute the MFCCs */
xtract_mfcc(spectrum, BLOCKSIZE >> 1, &mel_filters, mfccs);
// check for nans
for (int d = 0; d<MFCC_FREQ_BANDS; d++){
if (isnan(mfccs[d])){
// Add breakpoint here for debugging purposes
xtract_mfcc(spectrum, BLOCKSIZE >> 1, &mel_filters, mfccs);
d = MFCC_FREQ_BANDS;
}
}
}
}
}
Thanks, Navid
Hi Navid, sorry for the slow reply. Would you mind uploading somewhere an example of one of the audio files you're using with this? One of the files that is causing the problem you are experiencing. Then I can debug this and make a fix. Jamie
Hello, In addition to thanking you for providing this tool, I've been using your source-code for a while now, I'd like to suggest making a minor adjustment to the source-code with respect to calculating the logarithm of variables. When calculating the logarithm of a double value, if the value is too small, the compiler might round that value to zero. This boosts the output of, say for instance, a log function to inf. This problem was particularly hard for me to track down, because it only comes up when the signal noise level was very low and the compiler some times rounded off the value of a double data point to zero. This could be easily resolved by simply adding an epsilon value to all variables before computing the logarithm. The instance I found was in vector.c: log(temp) in calculating the signal spectrum.
I'm sure there are other instances as well.
Many thanks, Dev