oneapi-src / oneDAL

oneAPI Data Analytics Library (oneDAL)
https://software.intel.com/en-us/oneapi/onedal
Apache License 2.0
606 stars 213 forks source link

Decision Forest prediciton works with any number of features #292

Closed Mightrider closed 4 years ago

Mightrider commented 4 years ago

Describe the bug If I train my Decision Forest with two features and use the model for predictions with more or less features checkComputeParams() returns true and compute() does not throw an exception.

To Reproduce

#include <daal.h>
#include <service.h>

int main() {
    const size_t nClasses = 2;
    const size_t vPCAFeatures = 2;

    /* creating arrays for shallow data access */
    float vData[8] = { 0, 0, 0, 1, 1, 0, 1, 1 };
    float vLabels[4] = { 0, 1, 1, 0 };

    float vTestData[6] = { 0, 1, 0, 1, 0, 1 };

    using namespace daal;
    using namespace daal::algorithms;
    using namespace daal::data_management;
    using namespace daal::algorithms::decision_forest::classification;

    /* Create Numeric Tables for training data and labels with the data pointer of xtensor */
    NumericTablePtr vTrainData(HomogenNumericTable<>::create(vData, 2, 4));
    NumericTablePtr vTrainGroundTruth(HomogenNumericTable<>::create(vLabels, 1, 4));

    /* Create an algorithm object to train the decision forest classification model */
    training::Batch<> vTrainingBatch(nClasses);

    /* Pass a training data set and dependent values to the algorithm */
    vTrainingBatch.input.set(classifier::training::data, vTrainData);
    vTrainingBatch.input.set(classifier::training::labels, vTrainGroundTruth);

    /* Build the model */
    vTrainingBatch.compute();

    /**
     * Predict using the trained model
     */
    for (size_t i = 1; i < 4; i++) {
        prediction::Batch<> vPredictorBatch(nClasses);

        std::cout << "Before assigning model and data: " << vPredictorBatch.checkComputeParams() << std::endl;

        vPredictorBatch.input.set(classifier::prediction::model, vTrainingBatch.getResult()->get(classifier::training::model));
        /* Pass a testing data set and the trained model to the algorithm */

        NumericTablePtr vNumTestData(HomogenNumericTable<>::create(vTestData, i, 6 / i));
        vPredictorBatch.input.set(classifier::prediction::data, vNumTestData);
        std::cout << i << " features: " << vPredictorBatch.checkComputeParams() << std::endl;

        vPredictorBatch.compute();
        printNumericTable(vPredictorBatch.getResult()->get(daal::algorithms::classifier::prediction::prediction), "Result");
    }
}

Output

Before assigning model and data: 0
1 features: 1
Result
0.000     
1.000     
0.000     
1.000     
0.000     
0.000     

Before assigning model and data: 0
2 features: 1
Result
0.000     
0.000     
0.000     

Before assigning model and data: 0
3 features: 1
Result
0.000     
1.000    

Expected behavior I would expect checkComputeParams() to return false and compute() to throw an exception for mismatching number of features. If I do the same with a SVM it results in an exception.

Environment:

PivovarA commented 4 years ago

Hello, @Mightrider I created a pr with fix https://github.com/intel/daal/pull/298

PivovarA commented 4 years ago

Hello @Mightrider Fix is ​​already uploaded to master branch. Thanks again for pointing out this issue. I was glad to help. Please contact if you have additional questions or problems.