nickgillian / grt

gesture recognition toolkit
853 stars 286 forks source link

ClassificationData::reformatAsRegressionData() indexing bug #157

Open jamiebullock opened 5 years ago

jamiebullock commented 5 years ago

I have found a bug in ClassificationData::reformatAsRegressionData().

Basically, it only works if the user adds training examples with contiguous class labels starting from 1. If, class labels do not start from 1 or are non-contiguous, the method crashes at line 1137

    regressionData.addSample(data[i].getSample(),targetVector);

The reason for this is that getNumClasses() is used to set the size of targetVector but then targetVector is indexed with targetVector[ classLabel-1 ] = 1.

So if we have 2 classes with labels of 3 and 4 targetVector gets set to a size of 2 and then indexed with 2 and 3.

I can see some possible solutions to this:

  1. Change the interface forMLP::init(), or both MLP::train() so that non-contiguous classification data is not accepted. For example, have MLP::train() return false if the passed in ClassificationData does not have contiguous class labels
  2. Use the classLabel index not the class label itself as an index into targetVector then store the class labels in the MLP instance so that a reverse lookup can be performed. This introduces a coupling between MLP and ClassificationData, but then MLP is the only class that calls reformatAsRegressionData(), so maybe this is OK.

I might attempt 2, but I'd appreciate any thoughts on this!