paris-saclay-cds / ramp-workflow

Toolkit for building predictive workflows on top of pydata (pandas, scikit-learn, pytorch, keras, etc.).

https://paris-saclay-cds.github.io/ramp-docs/

BSD 3-Clause "New" or "Revised" License

68 stars 42 forks source link

Blending #95

Closed kegl closed 6 years ago

kegl commented 6 years ago

Some restructuring, but the main addition is a new script

ramp_blend_submissions

It accepts a switch --submissions, defaults to ALL. It assumes that all submissions to be blended were run with --save-y-preds switch so it can just read the predictions from disk.

No unit test yet in travis and not many checks (e.g., if prediction files are present).

jorisvandenbossche commented 6 years ago

It appears you need to have previously saved the y_preds for each submission you are blending. Otherwise you get such an error:

joris@joris-XPS-13-9350:/media/joris/DATA/CDS/RAMP/ramp-kits/titanic$ ramp_blend_submissions
Blending Titanic survival classification
Reading train and test files from ./data ...
Reading cv ...
CV fold 0
Traceback (most recent call last):
  File "/media/joris/DATA/CDS/RAMP/ramp-workflow/rampwf/utils/testing.py", line 329, in _load_y_pred
    return problem.load_y_pred(data_path, input_path, suffix)
AttributeError: module '484371698' has no attribute 'load_y_pred'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/joris/miniconda3/bin/ramp_blend_submissions", line 11, in <module>
    load_entry_point('ramp-workflow', 'console_scripts', 'ramp_blend_submissions')()
  File "/media/joris/DATA/CDS/RAMP/ramp-workflow/rampwf/utils/command_line.py", line 167, in ramp_blend_submissions
    min_improvement=float(args.min_improvement))
  File "/media/joris/DATA/CDS/RAMP/ramp-workflow/rampwf/utils/testing.py", line 733, in blend_submissions
    input_path=fold_output_path, suffix='train')
  File "/media/joris/DATA/CDS/RAMP/ramp-workflow/rampwf/utils/testing.py", line 333, in _load_y_pred
    return np.load(y_pred_f_name)['y_pred']
  File "/home/joris/miniconda3/lib/python3.5/site-packages/numpy/lib/npyio.py", line 370, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: './submissions/random_forest_20_5/training_output/fold_0/y_pred_train.npz'

I think we should provide a better error message (indicating what is needed), and indicate this in the help of ramp_blend_submissions

kegl commented 6 years ago

Definitely. ramp_test_submission must be run with --save-y-preds switch for all submissions to be blended. I'll let you add this at the right place (in create_ramp_blend_submissions_parser?)

Also a better error message here? https://github.com/paris-saclay-cds/ramp-workflow/blob/blending/rampwf/utils/testing.py#L330

codecov[bot] commented 6 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@ae2d6ff). Click here to learn what that means. The diff coverage is 29.8%.

@@            Coverage Diff            @@
##             master      #95   +/-   ##
=========================================
  Coverage          ?   83.69%           
=========================================
  Files             ?       70           
  Lines             ?     2723           
  Branches          ?        0           
=========================================
  Hits              ?     2279           
  Misses            ?      444           
  Partials          ?        0

Impacted Files	Coverage Δ
rampwf/utils/__init__.py	`100% <100%> (ø)`
rampwf/utils/combine.py	`30.18% <11.9%> (ø)`
rampwf/utils/command_line.py	`36.23% <14.28%> (ø)`
rampwf/utils/testing.py	`80.55% <41.37%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ae2d6ff...5938f55. Read the comment docs.

jorisvandenbossche commented 6 years ago

Follow-up to do:

add tests for blending
restructure/refactor testing.py