Closed kegl closed 3 years ago
A way of allowing imports within a submission would be to add the path to this particular submission.
In utils.testing.py
, part of assert_submission code, add
sys.path.append(submission_path)
at beggining and
sys.path.remove(submission_path)
after execution for not messing up too much between the different submissions.
if we had the __main__.py
as suggested by @jorisvandenbossche in issue #159 then running python -m rampwf
adds the current working directory in sys.path
and local import works (same difference with pytest and python -m pytest
vs pytest
). But it's still an open question whether we want people to be able to do that.
If we change in https://github.com/paris-saclay-cds/ramp-workflow/blob/master/rampwf/utils/cli/testing.py#L6 and L7 to
from rampwf.utils import assert_submission
from rampwf.utils import assert_notebook
then we can call the exact same script as ramp-test
by python <rampwf_root>/utils/cli/testing.py
.
Anybody knows why we need the relative imports in the cli scripts?
then we can call the exact same script as
ramp-test
bypython <rampwf_root>/utils/cli/testing.py
.
actually anyone can use python -m rampwf.utils.cli.testing
, this does not require to change the relative imports into absolute imports and this does not require to know the path of the rampwf package
that being said, after some discussions a better solution (when developing) might just be to create a local package located in the root kit dir and use pip install -e .
After some discussion, two solutions emerge. We can implement both, with slightly different purposes:
If we have a local folder outside the RAMP kit, I propose to add a CLI option
--import <local_folder>
We will add <local_folder>
to the sys.path. Since this folder is not pushed to github, ramp-test
without the option will and should fail. The use case here is to use someone's local tools during the development phase. Once the code is settled, submissions should stop using the local folder, before being submitted to the server or communicated to another team.
We can have a dedicated folder in the kit, for example utils
, which may contain code that can be imported by every submission (or even by problem.py
). Since this will be part of the kit, we can push it to github and so it will also work on the RAMP board server. The use case is to modularize code that is specific to the kit but which may be reused by several submissions. An example is the get_area_mean
function in the El Nino kit, which is used in every submission.
1. We can have a dedicated folder in the kit, for example `utils`, which may contain code that can be imported by every submission (or even by `problem.py`). Since this will be part of the kit, we can push it to github and so it will also work on the RAMP board server. The use case is to modularize code that is specific to the kit but which may be reused by several submissions. An example is the [`get_area_mean`](https://github.com/ramp-kits/el_nino/blob/master/submissions/starting_kit/ts_feature_extractor.py#L9) function in the El Nino kit, which is used in every submission.
I am not sure I understand how this will work during a challenge?
Also what happens when we play with a submission in a notebook or python script?
Also what happens when we play with a submission in a notebook or python script?
Could you please give an example? What potentially could happen ?
I am not sure I understand how this will work during a challenge?
Hm, now we execute the same script on the server as we do locally. So if utils
is present on the worker server, ramp-test
would run the same way as locally. What I'm suggesting here that in the ramp-test
script, utils
would be automatically added to the sys.path (no option --import
for this). Maybe @rth can comment if I'm wrong.
Yes this is clear but how does a user submit its utils
folder? You wrote "push to github" above but the users do not have write access to the kit repo and this could be complicated.
Could you please give an example? What potentially could happen ?
You are playing with your submission in a python script where you call
trained_workflow = problem.workflow.train_submission(
'submissions/starting_kit', X_train, y_train)
You need to manually add utils
to sys.path
or the local folder that you need.
To be precise, I propose to implement both 1 and 2 (utils and --import), they serve slightly different needs.
Yes this is clear but how does a user submit its
utils
folder? You wrote "push to github" above but the users do not have write access to the kit repo and this could be complicated.
utils
belong to the RAMP kit, not to individual submissions. It could be modified during a challenge, but it would serve all the submissions, and the modification would be through PRs in the github repo, which we would then pull at the server. This could also be the mechanism for adding libraries, through PRs in requirements.txt
in the ramp kit repo.
@agramfort @rth would it be interesting for your challenges? I remember that most of the time when users request new packages it's somewhat manual to add them and sometimes we forgot to add it both to the site server and the worker server.
To be precise, I propose to implement both 1 and 2 (utils and --import), they serve slightly different needs.
Ok I had only one usecase in mind.
The --import
option with a folder in the kit would be sufficient for me (development/non-challenge phase). I would even restrict its use to one known folder. So that when collaborating, other contributors do not have to guess the folder they need to import to run my submissions. This is then more like the utils
solution.
The utils
for the challenge, I have no strong opinion except than having both makes things more complicated: have you ever received a request from a challenge candidate saying that he would have liked this feature?
As an organizer it would have made the code cleaner in El Nino but also in Mars craters where we wanted to supply utilities to the participants. Some other time people wanted to simplify the problem.py by putting some of the kit-dependent RAMP elements (scores, workflows) in a "header" file.
To import files in submissions, I have found that the easiest way is to create symbolic links.
Let's suppose that we are in the same directory as problem.py.
It I want to import files, packages, .. from home directory for example, I would do ln -s ~/home/
My general opinion is that modifying python paths to satisfy our needs could make them (and ramp at the same time) more obscure for users so I would process carefully when adding the features, and one at a time.
Using symlinks is indeed a possible solution for 1. above, although in this case I would be strongly in favor of advising to create a package that you install locally (possibly installed in editable mode). This is not specific to ramp
: You can be working on a python project located somewhere on your computer, if you need modules that you implemented somewhere completely different you'll have the same problem. Besides, using this symlink solution, you still need to add the home directory in the python path when calling ramp
or call ramp
with python -m
. This might also not be the easiest solution for any user (not even sure if this is easy to do on Windows).
When working with ramp in a non-challenge setting I think using python -m
is a good solution (+ you share the needed modules when you share/push the ramp kit). Your collaborators just need to know that they have to use python -m
.
The utils/
solution (solution 2. above) could be a good start for the organizers of a challenge (and very limited for the users but I see this as a good point).
Also the issue with sylinks is that it might not work on Windows without admin rights unless one enables "Developer mode" https://docs.python.org/3/library/os.html#os.symlink
Also what would be the issue of having the home directory by default in sys.path (as is the case when you use python -m)? Is it that the users could access private files?
I'd also vote down symlinks. Let's go ahead with utils
as a first step. @avirmaux please make sure that in the upcoming challenge all code that is required by problem.py or the example submissions are in utils
(so no algo
in addition.)
A lot of times submitters would like to reuse some local libraries they have, importing them in the models (e.g., regressor.py). This is not possible at this point. I am not sure whether we want to have this feature because it would make the code non-portable, and of course non-submittable to ramp board. But it may be an optional feature during the development of the pipeline.