mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3k stars 401 forks source link

PermutationImportance subsample error #325

Closed DWgit closed 3 years ago

DWgit commented 3 years ago

There's an error here, based on the code I suggested. It's stopping now with:

train_size=10 should be either positive and smaller than the number of samples 10 or a float in the (0, 1) range
Problem during computing permutation importance. Skipping ...

With these short datasets, a proportional partition may be reasonable/appropriate here.

pplonski commented 3 years ago

For such a small dataset, I think the data can be used without subsampling. What do you think @DWgit? I removed subsampling for few rows of data. Could you please check if it works for your data?

BTW, @DWgit do you get good ML models with MLJAR?

DWgit commented 3 years ago

That sounds reasonable. Would be good to capture that idea in the tech docs so callers know how to interpret the results.

Code running now, well past this issue, thank you for the quick turnaround.

@pplonski This is our first experiment with MLJAR, don't yet have results to report.