jamiebullock / LibXtract

LibXtract is a simple, portable, lightweight library of audio feature extraction functions.
MIT License
227 stars 46 forks source link

round-off errors in generating spectra and other parameters #77

Open shdevin opened 9 years ago

shdevin commented 9 years ago

Hello, In addition to thanking you for providing this tool, I've been using your source-code for a while now, I'd like to suggest making a minor adjustment to the source-code with respect to calculating the logarithm of variables. When calculating the logarithm of a double value, if the value is too small, the compiler might round that value to zero. This boosts the output of, say for instance, a log function to inf. This problem was particularly hard for me to track down, because it only comes up when the signal noise level was very low and the compiler some times rounded off the value of a double data point to zero. This could be easily resolved by simply adding an epsilon value to all variables before computing the logarithm. The instance I found was in vector.c: log(temp) in calculating the signal spectrum.

I'm sure there are other instances as well.

Many thanks, Dev

jamiebullock commented 9 years ago

Hi,

Thanks for using LibXtract and reporting this issue.

LibXtract already includes a check to ensure that log(0) isn't computed in xtract_spectrum(), i.e.

       if ((temp = XTRACT_SQ(real) + XTRACT_SQ(imag)) >
                    XTRACT_LOG_LIMIT)
                temp = log(temp / NxN);
            else
                temp = XTRACT_LOG_LIMIT_DB;

Where XTRACT_LOG_LIMIT is defined as 2e-42 in xtract_macros_private.h.

Actually, I think 2e-42 was set when LibXtract was still single-precision, and is certainly a lot higher than the smallest normal double.

So... before we start adding epsilon values, we need to look at why you're getting inf values in your code. A good starting point would be stepping through with the debugger.

Can you post an example of your source code (and input), which reproduces the issue?

Thanks,

Jamie

shdevin commented 9 years ago

I see. I automatically assumed that the log function was the source of the problem. I've been trying to find the error, but didn't reach any conclusion. I checked XTRACT_LOG_LIMIT and my compiler also recognizes 2e-42 as a non-zero double. I also looked at my own data array, which also seems to be okay. Whatever this is, it sometimes happens and sometimes doesn't. Replacing the last entry of the array (i.e. spectrum[BLOCKSIZE/2] = 0.0) seemed to fix it for me, but that's obviously isn't a permanent solution.

Thanks, Navid

jamiebullock commented 9 years ago

Would you mind sharing a minimum working code fragment that reproduces the problem? Then I can have a go at debugging it.

Thanks,

Jamie

shdevin commented 9 years ago

This is the feature extraction code. As you can see, it's pretty much the same as simpletest.cpp, except that I read files from a list of files in the following format: wav-id /path-to-wav/wavname.wav The thing is that the problem only happens when I try to extract features for chunks of files (not one file at a time). That's what makes the bug difficult to locate.

#include <iostream>
#include "xtract/libxtract.h"
#include "xtract/xtract_stateful.h"
#include "xtract/xtract_scalar.h"
#include "xtract/xtract_helper.h"
#include <math.h>
#include <fstream>
#include "WaveFile.h"

#define BLOCKSIZE 512
#define MAVG_COUNT 10
#define HALF_BLOCKSIZE 256
#define SAMPLERATE 16000
#define PERIOD 102
#define MFCC_FREQ_BANDS 14
#define MFCC_FREQ_MIN 20
#define MFCC_FREQ_MAX 8000

int main(){
  // Initialize mel filters and Windows:
  double argd[4] = {0}; // argument array (needed for feature/spectrum estimation)
  double spectrum[BLOCKSIZE] = {0};
  double mfccs[MFCC_FREQ_BANDS] = {0};

  //Allocate Mel filters
  xtract_mel_filter mel_filters;
  mel_filters.n_filters = MFCC_FREQ_BANDS;
  mel_filters.filters   = (double **)malloc(MFCC_FREQ_BANDS * sizeof(double *));
  for(uint8_t k = 0; k < MFCC_FREQ_BANDS; ++k)
  {
    mel_filters.filters[k] = (double *)malloc(BLOCKSIZE * sizeof(double));
  }

  // Initialize mfcc features.
  xtract_init_mfcc(BLOCKSIZE >> 1, SAMPLERATE >> 1, XTRACT_EQUAL_GAIN, MFCC_FREQ_MIN, MFCC_FREQ_MAX, mel_filters.n_filters, mel_filters.filters);

  // create the window functions
  double *window = NULL;
  double windowed[BLOCKSIZE] = {0};
  window = xtract_init_window(BLOCKSIZE, XTRACT_HANN);
  // End of initialization

  // Read list of files:
  std::ifstream wavList;
  wavList.open("path/to/file/wav_test.txt");
  std::string line_vals;
  std::string delimiter = " ";
  int nFiles = 0;
  while (std::getline(wavList,line_vals)) {
    nFiles++;
    // RegEx to separate utt-ID from utt-path.
    std::size_t spaceIdx = line_vals.find(delimiter);
    std::string wavId = line_vals.substr(0,spaceIdx);
    std::string wavPath = line_vals.substr(spaceIdx+1);

    // Read audio data from file
    WaveFile wavFile(wavPath);

    int16_t *wavData = (int16_t *)wavFile.GetData(); // assume 16bit signed integer
    std::size_t wavBytes = wavFile.GetDataSize();
    uint64_t wavSamples = wavBytes / sizeof(int16_t);
    double data[wavSamples];
    std::cout << wavSamples << std::endl;
    //    std::cout << nFiles << std::endl;
    for (int n = 0; n < wavSamples; ++n){
      data[n] = (double)wavData[n]/(65535);// This value fixes it. Found it off the internet.
    }

    for (uint64_t n = 0; (n + BLOCKSIZE) < wavSamples; n += HALF_BLOCKSIZE){ // frame and overlap

      xtract_windowed(&data[n], BLOCKSIZE, window, windowed);

      /* get the spectrum */ 
      argd[0] = SAMPLERATE / (double)BLOCKSIZE;
      argd[1] = XTRACT_MAGNITUDE_SPECTRUM;
      argd[2] = 0.f; /* DC component - we expect this to zero for square wave */
      argd[3] = 0.f; /* No Normalisation */

      xtract_init_fft(BLOCKSIZE, XTRACT_SPECTRUM);
      xtract[XTRACT_SPECTRUM](windowed, BLOCKSIZE, &argd[0], spectrum);
      xtract_free_fft();

      /* THIS IS WHERE I MAKE THE CORRECTION TO THE SPECTRUM */
      spectrum[255] = 0.0;

      /* compute the MFCCs */
      xtract_mfcc(spectrum, BLOCKSIZE >> 1, &mel_filters, mfccs);

      // check for nans
      for (int d = 0; d<MFCC_FREQ_BANDS; d++){
        if (isnan(mfccs[d])){
          // Add breakpoint here for debugging purposes
          xtract_mfcc(spectrum, BLOCKSIZE >> 1, &mel_filters, mfccs);
          d = MFCC_FREQ_BANDS;
        }
      }
    }
  }
}

Thanks, Navid

jamiebullock commented 9 years ago

Hi Navid, sorry for the slow reply. Would you mind uploading somewhere an example of one of the audio files you're using with this? One of the files that is causing the problem you are experiencing. Then I can debug this and make a fix. Jamie