Closed manuelgitgomes closed 2 months ago
Hello @miguelriemoliveira and @Kazadhum.
I have now added cross validation support in batch execution on the branch dev/cross-validation
.
The user now defines type of cross validation and its parameters in data.yaml
:
https://github.com/lardemua/atom/blob/b61df4d9019f76c921ac605441ca162c20c77e50/atom_batch_execution/experiments/rrbot_example/data.yml#L23-L26
Right now, fold creation is supported with StratifiedKFold, KFold, LeaveOneOut, and StratifiedShuffleSplit from scikit-learn. The classes used for stratification are the combination of sensors and patterns detected, as written in: https://github.com/lardemua/atom/blob/b61df4d9019f76c921ac605441ca162c20c77e50/atom_batch_execution/scripts/batch_execution#L30-L54
This then creates a new auto_rendered.yaml
, with division of each run in folds, using -csf
to define the collections used by the fold:
process_results
is also adapted to run with these folds!
I have done a test with rrbot and everything seemed nice, can you test on your machines?
Hi @manuelgitgomes ,
looks great. Thanks.
@Kazadhum I do not have a lot of time right now. Can you test it please?
One question: if we want the run the old way, is it possible or not?
Hi@manuelgitgomes and @miguelriemoliveira! I'll test it as soon as I can. I'll try to tell you something today.
Hi @manuelgitgomes and @miguelriemoliveira !
I just tested with stratified k-folding and it seems to be working correctly at first glance, as well as process_results
.
BTW, is it still the case that it doesn't make sense to have the collection
column in the processed results, since it also averages the collection number?
I can run more thorough tests next week, if necessary
One question: if we want the run the old way, is it possible or not?
Right now, you can't, but I can change that easily enough (I think). On it.
BTW, is it still the case that it doesn't make sense to have the
collection
column in the processed results, since it also averages the collection number?
It doesn't, I believe. I can delete it.
BTW, is it still the case that it doesn't make sense to have the
collection
column in the processed results, since it also averages the collection number?It doesn't, I believe. I can delete it.
Now that I mention it, I think it doesn't make much sense to have any lines at all besides the "Average" line, since all other values are "meaningless" (as they belong to different collections). Do you agree?
Now that I mention it, I think it doesn't make much sense to have any lines at all besides the "Average" line, since all other values are "meaningless" (as they belong to different collections). Do you agree?
In the processed results? Sure, makes sense. Can also be removed, but I don't see any wrong in leaving it there, right?
Changed batch execution and process results to allow for an empty cross validation definition in data.yml. This guarantees backwards compatibility, which was tested with old_template.yml.j2 Also removed the "Collection #" column from the processed results.
Looks great!
This has been merged to main, closing.
Enhance cross validation by using scikit-learn functions.