Codecov Report

Merging #84 into master will increase coverage by 2.63%. The diff coverage is 94.55%.

@@            Coverage Diff             @@
##           master      #84      +/-   ##
==========================================
+ Coverage   89.41%   92.04%   +2.63%     
==========================================
  Files          66       67       +1     
  Lines        1993     2099     +106     
==========================================
+ Hits         1782     1932     +150     
+ Misses        211      167      -44

Impacted Files	Coverage Δ
rampwf/utils/__init__.py	`100% <100%> (ø)`	:arrow_up:
rampwf/utils/combine.py	`100% <100%> (ø)`
rampwf/prediction_types/base.py	`86.2% <100%> (+40.05%)`	:arrow_up:
rampwf/kits/ramp_kit.py	`95.23% <100%> (ø)`	:arrow_up:
rampwf/utils/tests/test_testing.py	`100% <100%> (ø)`	:arrow_up:
rampwf/utils/command_line.py	`46.93% <25%> (-4.29%)`	:arrow_down:
rampwf/utils/testing.py	`98.95% <98.3%> (+7.21%)`	:arrow_up:
rampwf/prediction_types/combined.py	`97.29% <0%> (+21.62%)`	:arrow_up:
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 0d29f33...4557665. Read the comment docs.

agramfort commented 7 years ago

@kegl can help me understand the problem you're solving here?

kegl commented 7 years ago

Increasing coverage :)

agramfort commented 7 years ago

you have my full support. But why is this increasing it?

kegl commented 7 years ago

Porting cv-bagging into the test.
Adding also classical CV setup (model is retrained on full training set).
Letting the users (optionally) pickle their trained models locally.
Letting the users (optionally) save their y_preds (e.g. if they want to submit test predictions to Kaggle). The default is as simple np.savetxt. If it doesn't work (e.g. mars craters), we catch the error and tell it to the users that they should implement a save_y_pred method in problem.py. This also allows them to follow the format they want, e.g., in kaggle_seguro we need to re-read X_test to get the ids outputted into the submission y_test_pred.

The higher level reason of all of this is to continue porting functionalities of ramp-board into ramp-workflow so the starting kits can be tested properly. E.g. BasePrediction.combine was not covered before, causing ramp-board crashing even when starting kit passed. But this is only a side effect, the goal is to give these functionalities to the user. The last such thing will be model combination, which we will be able to implement now that we can save the predictions.

kegl commented 7 years ago

One more thing which is getting easier now is to write a script that optimizes/makes a report on CV parameters (number of folds and train/test proportions).

kegl commented 7 years ago

@jorisvandenbossche @aboucaud I'll need help here. Both travis https://travis-ci.org/paris-saclay-cds/ramp-workflow/jobs/294574847 and local pytest crashes when it doesn't find the file mars_craters/data/sample_submission.csv. It's weird that it is looking for it. It's a file that save_y_pred in kaggle_seguro/problem.py requires, but what does this have to do with mars craters?

It's like in this line https://github.com/paris-saclay-cds/ramp-workflow/blob/cv_bagging/rampwf/utils/testing.py#L117 when it doesn't find save_y_pred in mars_craters/problem.py, instead of raising AttributeError, it falls back to the previous call's kaggle_seguro/problem.py, and crashes there (when can't find data/sample_submission.csv). Weird.

kegl commented 7 years ago

Indeed, if I take off kaggle_seguro from the top of the list here, mars craters (and the whole test) runs with no problem. https://github.com/paris-saclay-cds/ramp-workflow/blob/cv_bagging/rampwf/kits/ramp_kit.py#L11

aboucaud commented 7 years ago

From what I could understand, travis is downloading the mars crater data before the testing of the kaggle_seguro is complete. Therefore the data_path variable set to '.' refers to the mars_crater directory and not the kaggle_seguro one, hence the trouble we're having.

We need to add some coordination.

kegl commented 7 years ago

It happens also locally, running pytest, but I guess it's the same problem. I could get around it by catching Exception instead of AttributeError, but it's somewhat unsatisfying. If what you're saying is true, we may have other sync problems. The good news is that this is strictly a testing issue since in no other places we will loop through different starting kits.

agramfort commented 7 years ago

you have no async code. How can this happen?

kegl commented 7 years ago

OK, I found the problem, it was the same that made imp-ed modules unpickleable: we forgot to name the modules.

kegl commented 7 years ago

I think I'm finished. Please @aboucaud @glemaitre @jorisvandenbossche review the code. Also add comments where you think I should comment more, in case some of the additions are unclear. I wouldn't be against some modularization, assert_submission is pretty long now.

jorisvandenbossche commented 7 years ago

@kegl I like the idea of coloring the output of assert_test_submission to make the overview of the output better, but I think we need a solution that is also compatible with Windows (colored mentions it is only for linux/unix, and @glemaitre also quickly tested it on windows and it does not seem to work). There are terminal coloring libraries that can do that I think, but I would personally leave that for a separate PR.

jorisvandenbossche commented 7 years ago

Small miscommunication with @glemaitre: it does not work on Windows, but also does not output any garbage (it just ignores it). So then might be fine to keep, although it would still be nice to have something that works cross-platform

agramfort commented 7 years ago

what's the meaning of colors?

kegl commented 7 years ago

I agree that we need to cut up assert_submission, will work on this.

I made the warning disappear, a bit hacky, see next commit.

Yes, I went overboard with colors. I like that the yellow stands out and shows me where the run started when I scroll back. Then green/blue/red is train/valid/test, and bright means the official score. My main purpose in the beginning was to make the official test scores stand out (bright red). Let's keep this open for now, see how it feels after getting used to it.

paris-saclay-cds / ramp-workflow

porting cv bagging from ramp-board to ramp-workflow #84

Codecov Report