spacetelescope / jwst_validation_notebooks

JWST Pipeline Validation Notebooks
BSD 3-Clause "New" or "Revised" License
5 stars 23 forks source link

Fix environment.yaml to only installed directly imported packages #147

Open jdavies-st opened 3 years ago

jdavies-st commented 3 years ago

Looking at

https://github.com/spacetelescope/jwst_validation_notebooks/blob/b7a9f344b3349882c815b55151a84deffeb239b3/environment.yml#L34-L71

I notice that there's a lot of specified packages installed that are not actually used in the notebooks as direct imports, and are actually dependencies of jwst or one of the simulators. It would be best to not have these in there. So spherical_geometry for example gets installed as a dependency of jwst, but there's no reason to have it specifically spelled out in the environment.yml if it is not being directly imported anywhere in any of the notebooks. It is only needed because jwst needs it. And jwst will define exactly which version(s) it needs and install it, or not install it if it is no longer needed. Same is true for many of these packages.

It would be good to go through for each package listed in environment.yml and grep the repo to see if it is actually used in an import statement in a notebook. If it is used in an import statement, then it should be listed here. If not, then remove it.

Further, jwst and pysiaf should both be installed via pip install jwst, and pip install pysiaf instead of from github, unless you're actually trying to get the latest dev version of pysiaf, which is what is happening now.

Also, jupyter is being installed both by conda and pip. Pick one that works.

Finally, if this repo is intended to test that these notebooks work with the jwst pipeline, it would be good not to pin the version of the pipeline here. Instead, test the latest release. And pip install jwst will get the latest. Otherwise this repo is not doing what it is supposed to be doing - automated testing of the latest pipeline releases.

york-stsci commented 3 years ago

A few points.

First, packages like spherical_geometry are included in the environment.yml file because, when they weren't included, the notebooks wouldn't run because they were imported but not installed. Have you confirmed that all of the notebooks run to completion in an environment created without those packages in the yml file?

Second, I agree with respect to installing packages from GitHub. The environment.yml file is inherited, and I didn't think to change those. I will do that soon.

Third, same as second, but for jupyter. I don't know why I was doing that. I'll fix it.

Fourth, the reason that the version of jwst is pinned in the environment.yml file is to make sure that, when we start testing a new release, we do so explicitly and as a group. We were previously having problems where people would clone the repository and create their environment (from the same environment.yml file) but have different versions of the pipeline installed, so it was very hard for other people to reproduce what we were seeing.

Also, the testing is not supposed to be automated, in that the Jenkins build is supposed to be run manually when required, and only after all of the notebooks have been tested to work locally. Running the tests is supposed to be automated, in the sense that you don't have to run each notebook yourself (or create each HTML file yourself), but it is not currently the plan to have the jenkins build run either based on a schedule or based on PRs being merged.

cracraft commented 3 years ago

I agree that we should determine which packages need to be installed for the notebooks to run, and make sure to install only those, whatever that might look like. I think the end goal is to have the notebooks run automatically, but we haven't had the discussion yet to determine what type of schedule to use. We should talk about that and decide if and when we want to set that up.