Open AdamosX opened 2 days ago
Hello @AdamosX , the finite checker acceleration can only work with input data which is float32 or float64, so it is falling back to using the sklearn finite checker. The y values which come out of make_classification are integer values, which are finite by definition (and won't be finite checked at all) meaning that there is no difference in the overall speed. In this case, the log was a bit of a red herring.
I will try and correct that it is using X
in the log messages, it is hard-coded to that and is a misnomer, it should be using input_name
Describe the bug When running RandomForestClassifier I encountered the following text in the logs:
sklearn.utils.validation._assert_all_finite: patching failed with cause - X dtype is not float32 or float64.
This happens only if the number of samples is sufficiently big. I don't know how this relates to the optimizations - are they enabled or not.The function _assert_all_finite in daal4py.sklearn.utils.validation contains the following code:
To Reproduce Run the following snippet:
Set n_samples = 100 or n_samples=33000
Expected behavior I haven't managed to understand if this check really influences the optimizations, the logging is wrong or I misunderstood the meaning. EDIT: Now I think it only disables the optimizations for _assert_all_finite and the random forest classifier itself, works as expected.
Output/Screenshots
For big n_samples the following is in the logs:
Whereas for small n_samples:
Environment: