Open q-depot opened 11 years ago
Yeah, the descriptors should serve that purpose. The trouble with min / max is that result bounds are in many cases a function of argv settings, which in turn can come from the results of other feature extraction functions. The theoretical min / max are therefore often -inf ... inf, which isn't that useful.
What I probably need to do is document the most typical cases.
I might have missed a bit in your source code, but I wanted to implement dependencies for each feature which might serve the purpose, basically whenever I want to switch on a feature I need to ensure the dependencies are on, this lead me to implement a new structure on top of yours and also started make me think that perhaps a json file would suit better, I'm thinking to have a quite big json file to describe each feature and its dependencies.
the C++ code below shows you how I'm binding feature update functions and the dependencies, mCallbacks is a std::vector where the feature update functions get called in the right order(if the feature is enable)
mCallbacks.push_back( { XTRACT_F0, "XTRACT_F0", std::bind( &ciLibXtract::updateF0, this ), false, SCALAR_FEATURE,
{ XTRACT_SPECTRUM }, 0.0f, 1.0f } );
mCallbacks.push_back( { XTRACT_BARK_COEFFICIENTS, "XTRACT_BARK_COEFFICIENTS", std::bind( &ciLibXtract::updateBarkCoefficients, this ), false, VECTOR_FEATURE,
{ XTRACT_SPECTRUM }, 0.0f, 1.0f } );
mCallbacks.push_back( { XTRACT_HARMONIC_SPECTRUM, "XTRACT_HARMONIC_SPECTRUM", std::bind( &ciLibXtract::updateHarmonicSpectrum, this ), false, VECTOR_FEATURE,
{ XTRACT_PEAK_SPECTRUM, XTRACT_F0 }, 0.0f, 1.0f } );
void ciLibXtract::updateCallbacks()
{
vector<FeatureCallback>::iterator it;
for( it = mCallbacks.begin(); it!=mCallbacks.end(); ++it )
if ( it->enable )
it->cb();
}
void ciLibXtract::enableFeature( xtract_features_ feature )
{
FeatureCallback *f = findFeatureCbRef( feature );
if ( !f )
return;
f->enable = true;
vector<xtract_features_> dependencies = f->dependencies;
for( auto k=0; k < dependencies.size(); k++ )
enableFeature( dependencies[k] );
}
void ciLibXtract::disableFeature( xtract_features_ feature )
{
FeatureCallback *f = findFeatureCbRef( feature );
if ( !f )
return;
f->enable = false;
// disable all features that depends on this one
std::vector<FeatureCallback>::iterator it;
for( it = mCallbacks.begin(); it != mCallbacks.end(); ++it )
if ( featureDependsOn( it->feature, feature ) )
disableFeature( it->feature );
}
Hi @q-depot
I have thought about replacing the feature descriptors with a set of JSON (or maybe RDF) files giving meta-data about the features. This is a lot of work though, to do it properly.
I can't help thinking you're making things more complicated than they need to be. Take a look at Chris Cannam's vamp-libxtract-plugin code. This is essentially a wrapper for LibXtract that allows it to run as a VAMP plugin inside Sonic Visualiser.
Take a look at XTractPlugin::process
. This is the meat of the plugin and shows how you can programmatically build the feature graph for any given feature.
The whole thing is about 1000 lines long and I'm pretty sure it could be adapted to write wrappers in other environments like Cinder.
I managed to writer a much leaner wrapper for Pure Data, which is included in the examples/ folder of LibXtract.
Jamie
I just had a look at the vamp implementation and I don't quite like it, one of the main goals with my implementation was to get rid of all the (messy) if statements, I prefer to have one place to declare each feature and any other properties or dependencies, then if each feature can specify the dependencies I only need to ensure the functions get called in the right order.
going back to my initial question, is there a way to normalise the results?
Ah, sorry for the digression! For scalar features, the caller is expected to provide normalisation. The feature descriptors attempt to provide sensible values for "min" and "max" for each scalar feature.
These can be accessed via the array of structs xtract_function_descriptor_t *
returned by xtract_make_descriptors()
. e.g. descriptors[XTRACT_VARIANCE].result.scalar.min
. These are only intended to be a guide, because as I said in most cases it is mathematically impossible to define a range for the results because of the flexibility in allowed inputs. This is by design, LibXtract is optimised for flexibility and efficiency at the expense of having a tightly defined output space. I'm not sure if I mentioned this, but I plan to write a high-level API at some point, which will be more constrained and simpler to use.
For scalar features, the caller is expected to provide normalisation.
what do you mean exactly?
I tried the function descriptor but it seemed to return the default value in most of the cases, also for some functions, the lower and upper bound have the same value. These were my first impressions, I didn't spend too much time working with the descriptor so I might have made a mistake somewhere.
besides that I've implemented a sort of simple auto calibration, I sample 1 second to find the maximum and minimum values to then clamp the results within the range. The good thing is that I can always visualise the signal, but I'm still not sure about this approach, mainly because it's still difficult to compare results, also sometimes I pick up and amplify results that are supposed to be tiny, but on my screen are let's say 100 times bigger and therefor they don't quite give you an actual snapshot of the audio signal.
I mean that LibXtract doesn't provide any normalisation internally, so if the calling code wants a feature value to be normalised, it needs to be handled there.
The descriptors attempt to provide a "hint" as to what the minimum and maximum value should be, or what a "likely" minimum and maximum are given the most likely input range.
In many cases the actual possible minimum and maximum values for a feature are -infinity
and +infinity
since there's nothing stopping the caller passing a pointer to an array of inf
to any of the functions.
If you want to properly normalise the features you need to make some assumptions about what the input data is going to be and normalise the whole feature graph.
However: if you found some feature descriptors where the minimum and maximum are the same, unless the same is -1
, this is a bug, so please report it separately
P.S. In an ideal situation, the min / max in the descriptor would be set to a mathematical expression (e.g. using reverse polish notation) which gives the min / max as a function of the input min / max
what are the conditions to normalise the input? All the functions are based on either the pcm data the spectrum or some derivate of those two.
The -1 is exactly the case I mentioned above, so it needs to be treated as generic boundaries, I find it a bit odd because some functions can return negative values, so shouldn't it be something else outside the function range? NULL could work.
Although LibXtract is nominally an "audio" feature extraction library, it's really a library for extracting features from arbitrary arrays of floating point data. It is designed in such a way that it can be used for non-audio data, and some users are using it in this way. Besides, it might be convenient for someone using an integer audio format to simply cast their data to floats, rather than converting to -1 / 1 bounded floats.(and back)
I agree that using -1 for a missing value where -1 is a valid value is terrible design. However, a straight NULL
can't be used because min
and max
are doubles, and even if they were pointers to doubles NULL is defined as (void *)0
so that doesn't work either.
One option would be to have a xtract_descriptor_value
type with a type
field that can be (for example) XTRACT_NIL
or XTRACT_DOUBLE
I understand that, but if you want to use it as an audio library I've got the feeling that you need to have some consistency with the results, whether this is done with libXtract or with some other code on top of it, it doesn't really matter to me, but I'm keen on finding a way to nail the boundaries, so please let me know if you can think about anything better than my solution above.
You mean better than the auto-calibration solution?
my "auto-calibration" only samples the results for one second and clamp them between the max and min value, so you can always display something, but I have a few concern about what you actually see.
Why not pre-calibrate the auto-scaling function by providing it with a block with a square wave of amplitude 1 for the maximum and a block of all zeroes for the minimum? For your system, this should give you the expected range for all of the features.
I'm not sure I'm following you, can you please clarify?
I mean it sounds like you've set up autoscaling, which for a given audio feature takes as its "max", the highest value ever received for that feature, and for "min" the lowest value received for that feature.
Instead of trying to infer the min/max from the live input, why not pre-calibrate your max and min by feeding into your autoscaler, a number of data that reflect the likely extrema of your expected input (and store) these values? I initially suggested zeroes/ones, but you might also want to include things like a block of white noise, which will give you a maximum value for things like irregularity.
I guess my overall point is, LibXtract doesn't know what its input data is going to be so can't give min/max for a lot of things, but if your Cinder library is only going to accept double precision samples in the range -1...1 then you can determine what the minima / maxima are for the extrema of your input data.
so what you are suggesting is to use my system with some white noise and evaluate max a min values, is that correct? I don't quite understand what you mean or what's the difference with infer min/max and pre-calibrate min/max.
PS: sorry for keeping this thread open, but this is a major issue and I'm very keen to find a solution that makes sense for both.
What I'm suggesting is that you:
Hi,
I'm trying to find a consistent way to normalise the scalar values, I thought to use the descriptor to get min and max values, but I noticed that not all the functions has a descriptor, is there another way to get min/max values for each function?