paris-saclay-cds / ramp-workflow

Toolkit for building predictive workflows on top of pydata (pandas, scikit-learn, pytorch, keras, etc.).
https://paris-saclay-cds.github.io/ramp-docs/
BSD 3-Clause "New" or "Revised" License
68 stars 43 forks source link

adding --data-label #245

Closed kegl closed 3 years ago

kegl commented 4 years ago

We add an optional --data-label switch to ramp-test.

If specified

  1. it is passed to problem.get_train_data and problem.get_train_data so the user can load the appropriate data, and
  2. if combined with --save-output, the training output is saved in subfolders submissions//training_ouput/.

If not specified, the behavior of ramp-test will remain the same. This also means that it will not interfere with ramp-board (@agramfort).

To close #244

codecov[bot] commented 4 years ago

Codecov Report

Merging #245 into advanced will increase coverage by 0.43%. The diff coverage is 98.50%.

Impacted file tree graph

@@             Coverage Diff              @@
##           advanced     #245      +/-   ##
============================================
+ Coverage     81.75%   82.18%   +0.43%     
============================================
  Files           133      138       +5     
  Lines          4921     5040     +119     
============================================
+ Hits           4023     4142     +119     
  Misses          898      898              
Impacted Files Coverage Δ
rampwf/utils/cli/testing.py 0.00% <0.00%> (ø)
rampwf/tests/kits/iris_data_label/problem.py 100.00% <100.00%> (ø)
...label/submissions/random_forest_10_10/estimator.py 100.00% <100.00%> (ø)
...s_data_label/submissions/starting_kit/estimator.py 100.00% <100.00%> (ø)
rampwf/tests/test_kits.py 96.15% <100.00%> (+0.59%) :arrow_up:
rampwf/utils/testing.py 100.00% <100.00%> (ø)
rampwf/score_types/base.py 69.23% <0.00%> (-4.11%) :arrow_down:
rampwf/score_types/roc_auc.py 100.00% <0.00%> (ø)
rampwf/score_types/combined.py 91.66% <0.00%> (ø)
rampwf/score_types/brier_score.py 100.00% <0.00%> (ø)
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b92ca38...8958b7a. Read the comment docs.

agramfort commented 4 years ago

how about you just have a data_path param that defaults to "./data" so we know we don't break anything?

kegl commented 4 years ago

We are putting the different data in ./data/<data_label>/, (better organization than flat ./<data_label>/), otherwise it's the same: the data label defaults to '' which then handled by the script in the usual way, essentially means the data path defaults to ./data.

You will see no difference in the behavior if you use ramp-test without data label.

We thought this through, trying different setups (we've bee using this internally for 8 months now), and this was the best.