Open KobaKhit opened 3 years ago
Thanx for the PR @KobaKhit It would be nice if (a) default value was -1 which indicates that no max files is used in the feature extraction process (b) random shuffling would also be parametrized (not by default set to true, as in many cases we need the feature extraction to take place in the file path order)
Currently,
extract_features_and_train
needs a list of folder paths. It would be useful to be able to set how many files per folder to read at most. So I addedmax_files
parameter with default 1000. Potentially randomly choosing those files would be another addition.I tested it in a Kaggle notebook and it worked fine.
Motivation behind it was that there is a Birdcall Kaggle competition with 264 classes (folders) and ~100 files per class (folder). It took longer longer than 9 hours to train a model and the Kaggle notebook timed out. So I decided to train on smaller number of files per folder, i.e. undersample classes.