Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
3k
stars
401
forks
source link
PermutationImportance fails when data has too few rows #324
Closed
DWgit closed 3 years ago
PermutationImportance
was enhanced in #208 to limit excessive computation when the number of columns is large:Originally posted by @pplonski in https://github.com/mljar/mljar-supervised/issues/208#issuecomment-697521246
If a dataset has fewer rows than these hardwired
train_size
values,subsample
throws an exception andPermutationImportance
fails.An obvious fix is to replace these with
train_size=min(nRows, constant)
.Wide and short datasets are quite common in biological applications, and feature importance is one of the most valuable outcomes of an analysis.
Thanks very much!