Open vijaykilledar opened 5 years ago
Can you please provide some data and code for comparison?
(There is a bigger difference between the internal and textual representation of values in Python I guess.)
ok I will provide detail example/data tomorrow.
attaching zip file contains
test script output at my end
./test_prediction.sh ./train_10000 ./train_10000_target ./porter_train_10000_double
test data file - test_data/train_10000
expected prediction data file - test_data/train_10000_target
testing output binray by feeding training data .......
Total records - 10000
Matched prediction records - 9878
./test_prediction.sh ./train_10000 ./train_10000_target ./porter_train_10000 _float
test data file - test_data/train_10000
expected prediction data file - test_data/train_10000_target
testing output binray by feeding training data .......
Total records - 10000
Matched prediction records - 9992
Okay, thanks. Can you please validate the data type of your training data?
print(type(X[0])) # <type 'numpy.float32'> or <type 'numpy.float64'>
For load_digits it's numpy.float64
which is double
in C. The integrity check finished without mismatches. So I changed the data to floats with X.astype(np.float32)
and finished the integrity check again without errors.
Nevertheless it depends on the data. In general I see the problem of point precisions between data types and programming languages. It could make sense to add a possibility to change the features data type in transpiled output by using a new argument temp_dtype='float'
.
Further atof()
converts a string to double
in C. On the other hand if you want to use floats, you should use strtof()
to convert strings to float
.
Can you test it?
C code exported by porter has wrong data type for feature value as double which will cause accuracy percentage.
scikit-learn code
porter C Code: