jamiebullock / LibXtract

LibXtract is a simple, portable, lightweight library of audio feature extraction functions.
MIT License
227 stars 46 forks source link

normalised values and function descriptor #25

Open q-depot opened 11 years ago

q-depot commented 11 years ago

Hi,

I'm trying to find a consistent way to normalise the scalar values, I thought to use the descriptor to get min and max values, but I noticed that not all the functions has a descriptor, is there another way to get min/max values for each function?

jamiebullock commented 11 years ago

Yeah, the descriptors should serve that purpose. The trouble with min / max is that result bounds are in many cases a function of argv settings, which in turn can come from the results of other feature extraction functions. The theoretical min / max are therefore often -inf ... inf, which isn't that useful.

What I probably need to do is document the most typical cases.

q-depot commented 11 years ago

I might have missed a bit in your source code, but I wanted to implement dependencies for each feature which might serve the purpose, basically whenever I want to switch on a feature I need to ensure the dependencies are on, this lead me to implement a new structure on top of yours and also started make me think that perhaps a json file would suit better, I'm thinking to have a quite big json file to describe each feature and its dependencies.

the C++ code below shows you how I'm binding feature update functions and the dependencies, mCallbacks is a std::vector where the feature update functions get called in the right order(if the feature is enable)

mCallbacks.push_back(   { XTRACT_F0, "XTRACT_F0", std::bind( &ciLibXtract::updateF0, this ), false, SCALAR_FEATURE,
                        { XTRACT_SPECTRUM }, 0.0f, 1.0f } );

mCallbacks.push_back(   { XTRACT_BARK_COEFFICIENTS, "XTRACT_BARK_COEFFICIENTS", std::bind( &ciLibXtract::updateBarkCoefficients, this ), false, VECTOR_FEATURE,
                        { XTRACT_SPECTRUM }, 0.0f, 1.0f } );

mCallbacks.push_back(   { XTRACT_HARMONIC_SPECTRUM, "XTRACT_HARMONIC_SPECTRUM", std::bind( &ciLibXtract::updateHarmonicSpectrum, this ), false, VECTOR_FEATURE,
                        { XTRACT_PEAK_SPECTRUM, XTRACT_F0 }, 0.0f, 1.0f } );
void ciLibXtract::updateCallbacks()
{
    vector<FeatureCallback>::iterator it;
    for( it = mCallbacks.begin(); it!=mCallbacks.end(); ++it )
        if ( it->enable )
            it->cb();
}

void ciLibXtract::enableFeature( xtract_features_ feature )
{
    FeatureCallback *f = findFeatureCbRef( feature );
    if ( !f )
        return;

    f->enable = true;

    vector<xtract_features_> dependencies = f->dependencies;
    for( auto k=0; k < dependencies.size(); k++ )
        enableFeature( dependencies[k] );
}

void ciLibXtract::disableFeature( xtract_features_ feature )
{
    FeatureCallback *f = findFeatureCbRef( feature );
    if ( !f )
        return;

    f->enable = false;

    // disable all features that depends on this one
    std::vector<FeatureCallback>::iterator it;
    for( it = mCallbacks.begin(); it != mCallbacks.end(); ++it )
        if ( featureDependsOn( it->feature, feature ) )
            disableFeature( it->feature );

}
jamiebullock commented 11 years ago

Hi @q-depot

I have thought about replacing the feature descriptors with a set of JSON (or maybe RDF) files giving meta-data about the features. This is a lot of work though, to do it properly.

I can't help thinking you're making things more complicated than they need to be. Take a look at Chris Cannam's vamp-libxtract-plugin code. This is essentially a wrapper for LibXtract that allows it to run as a VAMP plugin inside Sonic Visualiser.

Take a look at XTractPlugin::process. This is the meat of the plugin and shows how you can programmatically build the feature graph for any given feature.

The whole thing is about 1000 lines long and I'm pretty sure it could be adapted to write wrappers in other environments like Cinder.

I managed to writer a much leaner wrapper for Pure Data, which is included in the examples/ folder of LibXtract.

Jamie

q-depot commented 11 years ago

I just had a look at the vamp implementation and I don't quite like it, one of the main goals with my implementation was to get rid of all the (messy) if statements, I prefer to have one place to declare each feature and any other properties or dependencies, then if each feature can specify the dependencies I only need to ensure the functions get called in the right order.

q-depot commented 11 years ago

going back to my initial question, is there a way to normalise the results?

jamiebullock commented 11 years ago

Ah, sorry for the digression! For scalar features, the caller is expected to provide normalisation. The feature descriptors attempt to provide sensible values for "min" and "max" for each scalar feature.

These can be accessed via the array of structs xtract_function_descriptor_t * returned by xtract_make_descriptors(). e.g. descriptors[XTRACT_VARIANCE].result.scalar.min. These are only intended to be a guide, because as I said in most cases it is mathematically impossible to define a range for the results because of the flexibility in allowed inputs. This is by design, LibXtract is optimised for flexibility and efficiency at the expense of having a tightly defined output space. I'm not sure if I mentioned this, but I plan to write a high-level API at some point, which will be more constrained and simpler to use.

q-depot commented 11 years ago

For scalar features, the caller is expected to provide normalisation.

what do you mean exactly?

I tried the function descriptor but it seemed to return the default value in most of the cases, also for some functions, the lower and upper bound have the same value. These were my first impressions, I didn't spend too much time working with the descriptor so I might have made a mistake somewhere.

q-depot commented 11 years ago

besides that I've implemented a sort of simple auto calibration, I sample 1 second to find the maximum and minimum values to then clamp the results within the range. The good thing is that I can always visualise the signal, but I'm still not sure about this approach, mainly because it's still difficult to compare results, also sometimes I pick up and amplify results that are supposed to be tiny, but on my screen are let's say 100 times bigger and therefor they don't quite give you an actual snapshot of the audio signal.

jamiebullock commented 11 years ago

I mean that LibXtract doesn't provide any normalisation internally, so if the calling code wants a feature value to be normalised, it needs to be handled there.

The descriptors attempt to provide a "hint" as to what the minimum and maximum value should be, or what a "likely" minimum and maximum are given the most likely input range.

In many cases the actual possible minimum and maximum values for a feature are -infinity and +infinity since there's nothing stopping the caller passing a pointer to an array of inf to any of the functions.

If you want to properly normalise the features you need to make some assumptions about what the input data is going to be and normalise the whole feature graph.

However: if you found some feature descriptors where the minimum and maximum are the same, unless the same is -1, this is a bug, so please report it separately

jamiebullock commented 11 years ago

P.S. In an ideal situation, the min / max in the descriptor would be set to a mathematical expression (e.g. using reverse polish notation) which gives the min / max as a function of the input min / max

q-depot commented 11 years ago

what are the conditions to normalise the input? All the functions are based on either the pcm data the spectrum or some derivate of those two.

The -1 is exactly the case I mentioned above, so it needs to be treated as generic boundaries, I find it a bit odd because some functions can return negative values, so shouldn't it be something else outside the function range? NULL could work.

jamiebullock commented 11 years ago

Although LibXtract is nominally an "audio" feature extraction library, it's really a library for extracting features from arbitrary arrays of floating point data. It is designed in such a way that it can be used for non-audio data, and some users are using it in this way. Besides, it might be convenient for someone using an integer audio format to simply cast their data to floats, rather than converting to -1 / 1 bounded floats.(and back)

I agree that using -1 for a missing value where -1 is a valid value is terrible design. However, a straight NULL can't be used because min and max are doubles, and even if they were pointers to doubles NULL is defined as (void *)0 so that doesn't work either.

One option would be to have a xtract_descriptor_value type with a type field that can be (for example) XTRACT_NIL or XTRACT_DOUBLE

q-depot commented 11 years ago

I understand that, but if you want to use it as an audio library I've got the feeling that you need to have some consistency with the results, whether this is done with libXtract or with some other code on top of it, it doesn't really matter to me, but I'm keen on finding a way to nail the boundaries, so please let me know if you can think about anything better than my solution above.

jamiebullock commented 11 years ago

You mean better than the auto-calibration solution?

q-depot commented 11 years ago

my "auto-calibration" only samples the results for one second and clamp them between the max and min value, so you can always display something, but I have a few concern about what you actually see.

jamiebullock commented 11 years ago

Why not pre-calibrate the auto-scaling function by providing it with a block with a square wave of amplitude 1 for the maximum and a block of all zeroes for the minimum? For your system, this should give you the expected range for all of the features.

q-depot commented 11 years ago

I'm not sure I'm following you, can you please clarify?

jamiebullock commented 11 years ago

I mean it sounds like you've set up autoscaling, which for a given audio feature takes as its "max", the highest value ever received for that feature, and for "min" the lowest value received for that feature.

Instead of trying to infer the min/max from the live input, why not pre-calibrate your max and min by feeding into your autoscaler, a number of data that reflect the likely extrema of your expected input (and store) these values? I initially suggested zeroes/ones, but you might also want to include things like a block of white noise, which will give you a maximum value for things like irregularity.

I guess my overall point is, LibXtract doesn't know what its input data is going to be so can't give min/max for a lot of things, but if your Cinder library is only going to accept double precision samples in the range -1...1 then you can determine what the minima / maxima are for the extrema of your input data.

q-depot commented 11 years ago

so what you are suggesting is to use my system with some white noise and evaluate max a min values, is that correct? I don't quite understand what you mean or what's the difference with infer min/max and pre-calibrate min/max.

PS: sorry for keeping this thread open, but this is a major issue and I'm very keen to find a solution that makes sense for both.

jamiebullock commented 11 years ago

What I'm suggesting is that you:

  1. Identify some input data that represents the extrema of what you're expecting to be passed to your methods (e.g. all zeroes, all ones, white noise, a sine wave, a sawtooth wave)
  2. You present one block of each of these to your system allowing your auto-calibration function to find minimum / maximum values for each feature
  3. You keep a record of the min / max found by your auto-calibration function, and replace your auto-calibration function with these minima and maxima