paris-saclay-cds / ramp-workflow

Toolkit for building predictive workflows on top of pydata (pandas, scikit-learn, pytorch, keras, etc.).
https://paris-saclay-cds.github.io/ramp-docs/
BSD 3-Clause "New" or "Revised" License
68 stars 42 forks source link

[RFC] importing files in submissions #181

Closed kegl closed 3 years ago

kegl commented 5 years ago

A lot of times submitters would like to reuse some local libraries they have, importing them in the models (e.g., regressor.py). This is not possible at this point. I am not sure whether we want to have this feature because it would make the code non-portable, and of course non-submittable to ramp board. But it may be an optional feature during the development of the pipeline.

avirmaux commented 5 years ago

A way of allowing imports within a submission would be to add the path to this particular submission. In utils.testing.py, part of assert_submission code, add sys.path.append(submission_path) at beggining and sys.path.remove(submission_path) after execution for not messing up too much between the different submissions.

albertcthomas commented 4 years ago

if we had the __main__.py as suggested by @jorisvandenbossche in issue #159 then running python -m rampwf adds the current working directory in sys.path and local import works (same difference with pytest and python -m pytest vs pytest). But it's still an open question whether we want people to be able to do that.

kegl commented 4 years ago

If we change in https://github.com/paris-saclay-cds/ramp-workflow/blob/master/rampwf/utils/cli/testing.py#L6 and L7 to

from rampwf.utils import assert_submission   
from rampwf.utils import assert_notebook     

then we can call the exact same script as ramp-test by python <rampwf_root>/utils/cli/testing.py.

Anybody knows why we need the relative imports in the cli scripts?

albertcthomas commented 4 years ago

then we can call the exact same script as ramp-test by python <rampwf_root>/utils/cli/testing.py.

actually anyone can use python -m rampwf.utils.cli.testing, this does not require to change the relative imports into absolute imports and this does not require to know the path of the rampwf package

albertcthomas commented 4 years ago

that being said, after some discussions a better solution (when developing) might just be to create a local package located in the root kit dir and use pip install -e .

kegl commented 3 years ago

After some discussion, two solutions emerge. We can implement both, with slightly different purposes:

  1. If we have a local folder outside the RAMP kit, I propose to add a CLI option

    --import <local_folder>

    We will add <local_folder> to the sys.path. Since this folder is not pushed to github, ramp-test without the option will and should fail. The use case here is to use someone's local tools during the development phase. Once the code is settled, submissions should stop using the local folder, before being submitted to the server or communicated to another team.

  2. We can have a dedicated folder in the kit, for example utils, which may contain code that can be imported by every submission (or even by problem.py). Since this will be part of the kit, we can push it to github and so it will also work on the RAMP board server. The use case is to modularize code that is specific to the kit but which may be reused by several submissions. An example is the get_area_mean function in the El Nino kit, which is used in every submission.

albertcthomas commented 3 years ago
1. We can have a dedicated folder in the kit, for example `utils`, which may contain code that can be imported by every submission (or even by `problem.py`). Since this will be part of the kit, we can push it to github and so it will also work on the RAMP board server. The use case is to modularize code that is specific to the kit but which may be reused by several submissions. An example is the [`get_area_mean`](https://github.com/ramp-kits/el_nino/blob/master/submissions/starting_kit/ts_feature_extractor.py#L9) function in the El Nino kit, which is used in every submission.

I am not sure I understand how this will work during a challenge?

albertcthomas commented 3 years ago

Also what happens when we play with a submission in a notebook or python script?

martin1tab commented 3 years ago

Also what happens when we play with a submission in a notebook or python script?

Could you please give an example? What potentially could happen ?

kegl commented 3 years ago

I am not sure I understand how this will work during a challenge?

Hm, now we execute the same script on the server as we do locally. So if utils is present on the worker server, ramp-test would run the same way as locally. What I'm suggesting here that in the ramp-test script, utils would be automatically added to the sys.path (no option --import for this). Maybe @rth can comment if I'm wrong.

albertcthomas commented 3 years ago

Yes this is clear but how does a user submit its utils folder? You wrote "push to github" above but the users do not have write access to the kit repo and this could be complicated.

albertcthomas commented 3 years ago

Could you please give an example? What potentially could happen ?

You are playing with your submission in a python script where you call

trained_workflow = problem.workflow.train_submission(
    'submissions/starting_kit', X_train, y_train)

You need to manually add utils to sys.path or the local folder that you need.

kegl commented 3 years ago

To be precise, I propose to implement both 1 and 2 (utils and --import), they serve slightly different needs.

kegl commented 3 years ago

Yes this is clear but how does a user submit its utils folder? You wrote "push to github" above but the users do not have write access to the kit repo and this could be complicated.

utils belong to the RAMP kit, not to individual submissions. It could be modified during a challenge, but it would serve all the submissions, and the modification would be through PRs in the github repo, which we would then pull at the server. This could also be the mechanism for adding libraries, through PRs in requirements.txt in the ramp kit repo.

@agramfort @rth would it be interesting for your challenges? I remember that most of the time when users request new packages it's somewhat manual to add them and sometimes we forgot to add it both to the site server and the worker server.

albertcthomas commented 3 years ago

To be precise, I propose to implement both 1 and 2 (utils and --import), they serve slightly different needs.

Ok I had only one usecase in mind.

The --import option with a folder in the kit would be sufficient for me (development/non-challenge phase). I would even restrict its use to one known folder. So that when collaborating, other contributors do not have to guess the folder they need to import to run my submissions. This is then more like the utils solution.

The utils for the challenge, I have no strong opinion except than having both makes things more complicated: have you ever received a request from a challenge candidate saying that he would have liked this feature?

kegl commented 3 years ago

As an organizer it would have made the code cleaner in El Nino but also in Mars craters where we wanted to supply utilities to the participants. Some other time people wanted to simplify the problem.py by putting some of the kit-dependent RAMP elements (scores, workflows) in a "header" file.

martin1tab commented 3 years ago

To import files in submissions, I have found that the easiest way is to create symbolic links. Let's suppose that we are in the same directory as problem.py. It I want to import files, packages, .. from home directory for example, I would do ln -s ~/home/ . Then a reference to this file is created and I could use it as it was in my directory (problem.py's directory)

albertcthomas commented 3 years ago

My general opinion is that modifying python paths to satisfy our needs could make them (and ramp at the same time) more obscure for users so I would process carefully when adding the features, and one at a time.

Using symlinks is indeed a possible solution for 1. above, although in this case I would be strongly in favor of advising to create a package that you install locally (possibly installed in editable mode). This is not specific to ramp: You can be working on a python project located somewhere on your computer, if you need modules that you implemented somewhere completely different you'll have the same problem. Besides, using this symlink solution, you still need to add the home directory in the python path when calling ramp or call ramp with python -m. This might also not be the easiest solution for any user (not even sure if this is easy to do on Windows).

When working with ramp in a non-challenge setting I think using python -m is a good solution (+ you share the needed modules when you share/push the ramp kit). Your collaborators just need to know that they have to use python -m.

The utils/ solution (solution 2. above) could be a good start for the organizers of a challenge (and very limited for the users but I see this as a good point).

rth commented 3 years ago

Also the issue with sylinks is that it might not work on Windows without admin rights unless one enables "Developer mode" https://docs.python.org/3/library/os.html#os.symlink

albertcthomas commented 3 years ago

Also what would be the issue of having the home directory by default in sys.path (as is the case when you use python -m)? Is it that the users could access private files?

kegl commented 3 years ago

I'd also vote down symlinks. Let's go ahead with utils as a first step. @avirmaux please make sure that in the upcoming challenge all code that is required by problem.py or the example submissions are in utils (so no algo in addition.)