paris-saclay-cds / ramp-workflow

Toolkit for building predictive workflows on top of pydata (pandas, scikit-learn, pytorch, keras, etc.).
https://paris-saclay-cds.github.io/ramp-docs/
BSD 3-Clause "New" or "Revised" License
68 stars 43 forks source link

Backward compatibility policy #241

Open albertcthomas opened 4 years ago

albertcthomas commented 4 years ago

Regarding #236 but also more generally:

  1. What's the policy regarding backward compatibility with the ramp-kits? Any change should be compatible with the kits in ramp-kits (or any change made in rampwf should also modify the ramp-kits so that they work with the suggested change)?

Now that ramp-workflow is on PyPI, would it be a possibility to require the kits in ramp-kits to use a specific version of ramp-workflow and other dependencies? The kits not in the ramp-workflow repo are difficult to maintain, the ones in tests\kits\ are easy to maintain as part of the tests.

  1. What's the difference between the kits in tests/kits/ and the ones in ramp-kits but not in tests/kits/?
agramfort commented 4 years ago

ramp server has only one version of the ramp-workflow package. Having the workflow async with ramp-board is already a pain. Pinning version seems hard when you have one server unless youforce they pass with one unique version but then you lose the unifying ramp-kit structure as every kit can be unique.

If you ask me I would integrate ramp-workflow into ramp-board and simplify and simplify and simplify going more towards scikit-learn API for as many things as possible (especially the scorers)

kegl commented 4 years ago

@agramfort : what would you like to have for the scorers? If the goal is to be able to use sklearn metrics directly, it would be relatively easy to have a generic score_type factory that receives an sklearn scorer as input (when initialized in problem.py), and wraps it into a ramp scorer. You would get the best of both worlds.

I don't think we could completely scrape away the functionalities we added (e.g. precision for displaying, letting lower-the-better scorers), plus it's nice to have the possibility to recode score_function that receives Prediction objects when we have complex predictions and scorers.

agramfort commented 4 years ago

indeed it would work. My concern was to have a learn a new API for scoring as sklearn can be seen as a standard that would lead to a simple solution to warrantee that workflow and board can continue with a different code tree without a risk of incompatibility.

my 0.5c

kegl commented 4 years ago

@agramfort is there an automatic way to determine in skelarn what input a given scorer requires? E.g. raw y_pred like RMSE, or class indices like accuracy (computed from y_proba, returned by predict). If not, we'll need two or three different wrappers that the user would need to choose from.Any other suggestion how to deal with this?

agramfort commented 4 years ago

have a look at https://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html

you have greater_is_better and needs_proba parameters

kegl commented 4 years ago

OK, I see. This is something that we would do in RAMP, too, to wrap sklearn scorers into a RAMP scorer. But it seems that the "user" needs to provide the information on the sklearn score (e.g. what input it requires), it cannot be determined automatically, right? I mean: there is no catalogue (dict) in sklearn where greater_is_better and needs_proba parameters can be read out, right?

agramfort commented 4 years ago

it's the responsability of the scorer to call the right predict function in sklearn. A scorer takes estimator, X, y and does the right thing internally