ioos / system-test

IOOS DMAC System Integration Test project
github.com/ioos/system-test/wiki
The Unlicense
7 stars 14 forks source link

Should we have common conda/pip requirements for the entire system test? #109

Closed rsignell-usgs closed 8 years ago

rsignell-usgs commented 10 years ago

Should we have common requirements for all of the system test so that any notebook could be run? Currently we have pip and conda requirements at the lowest directory level.

Sounds like @Bobfrat is in favor, and I think I'm in favor also.

ocefpaf commented 10 years ago

I think that is a good idea. The majority of the requirements are the same and people probably have just one environment to run all notebooks.

kwilcox commented 10 years ago

I don't think this is a good idea... we shouldn't make someone install 10 billion things to run a simple notebook.

ocefpaf commented 10 years ago

@kwilcox I have the impression that the system-test notebooks have several common requirements. If that is not the case then you are right.

Maybe a more important question is if people that will use these notebooks will run one or more of these notebooks. If they will run more notebooks, do they want to isolate each notebook in a new venv?

I am not against this. I actually do it myself. However, most people do not want to go through all that just to run a notebook.

In my experience people want a one installation that just works. Having one system-test venv, installed from one requirement file, to run all the notebooks they can find here should be good enough for most users.

rsignell-usgs commented 10 years ago

@dpsnowden, what say ye? Perhaps best would be both requirements at the individual notebook level, and then aggregate these requirements to get the requirements necessary to run all the notebooks. Then we could see whether we have 14 packages we need to install, or 10 billion. ;-)

dpsnowden commented 10 years ago

Agree with the latest @rsignell-usgs comment. It would be nice to know if there is a configuration that works most of the time.

daf commented 10 years ago

Trying to sort these (or print what dir they're in, even) is beyond my shell ability, but here are all the packages used for pip.

~/dev/code/system-test -> paste **/pip-requirements.txt | column -t          
pandas         cython         cython         cython         cython      cython         pandas
numpy          numpy          numpy          numpy          numpy       numpy          numpy
matplotlib     scipy          scipy          scipy          scipy       scipy          matplotlib
OWSLib         pandas         pandas         pandas         pandas      pandas         OWSLib
netCDF4        matplotlib     matplotlib     matplotlib     matplotlib  matplotlib     netCDF4
csv            OWSLib         OWSLib         OWSLib         OWSLib      OWSLib         geojson
re             netCDF4        netCDF4        netCDF4        netCDF4     lxml           Shapely
cStringIO      lxml           lxml           lxml           lxml        pyoos          git+https://github.com/wrobstory/folium.git#egg=folium
urllib2        pyoos          pyoos          pyoos          pyoos       pyshp          rdflib
parser         pyshp          pyshp          pyshp          pyshp       Pillow
pdb            pyke           Pillow         Pillow         Pillow      pyke
random         Shapely        pyke           pyke           pyke        Shapely
datetime       biggus         Shapely        Shapely        Shapely     biggus
pylab          prettyplotlib  biggus         biggus         biggus      prettyplotlib
SPARQLWrapper  prettyplotlib  prettyplotlib  prettyplotlib
zipfile
lxml
pykml
json
ocefpaf commented 10 years ago

Modifying the @daf's shell command to:

ls **/pip-requirements.txt | rev| cut -f2 -d"/" | rev | awk 'BEGIN { ORS = "\t" } { print }' && echo && paste **/pip-requirements.txt

Copying the output to the clipboard and then:


from pandas import Series, value_counts, read_clipboard

df = read_clipboard()  # The columns names are the directories where they are from.

print('\n'.join(Series(df.values.ravel()).dropna().unique().tolist()))
# Common requirements:
pandas
cython
numpy
matplotlib
scipy
OWSLib
netCDF4
csv
geojson
re
lxml
Shapely
cStringIO
pyoos
git+https://github.com/wrobstory/folium.git#egg=folium
urllib2
pyshp
rdflib
parser
Pillow
pdb
pyke
random
datetime
biggus
pylab
prettyplotlib
SPARQLWrapper
zipfile
pykml
json

# Occurrence histogram (7 requirements.txt were found.)
df.apply(value_counts).sum(axis=1).plot(kind='bar', legend=False)

barplot

ocefpaf commented 10 years ago

Some comments:

pylab should be removed from requirements and imports!

re, random, csv, pdb, cStringIO, datetime, parser, urllib2, zipfile and json are in the standard library (at least for python 2.7) and do not need to be in the requirements.

Prefer BytesIO to cStringIO and urllib/requests to urllib2 that will make the notebook python 3k compatibility easier in the future.

If we remove the standard libs requirements, most of the packages are used at least in 4 out of 7 notebooks. With the exception of SPARQLWrapper, folium, geojson, pykml, and rdflib. (But I think that the folium count is wrong.)

kwilcox commented 10 years ago

We are so early on in the creation of notebooks... I'd rather just sit on this until we have a better idea of the audience for the notebooks and what the final list of requirements looks like.

dpsnowden commented 10 years ago
  1. Amazing shell wizardry
  2. I sort of agree with @kwilcox. I hope we keep our installation README.md up to date in each contribution.
  3. The comments about using the standard library and preferring packages with 3k compatibility seems like sound advice that we should strive for going forward.

What problem are we really trying to solve by standardizing the environment? Are people still having trouble staying current? I know that I haven't tried so I'm not a good judge. We do have a new project participant @duncombe-noaa who will be trying to run these notebooks in order to get spun up on the project. I hope that he can test our documentation and give us an outsiders perspective on whether we're doing ok or if we have a mess on our hands.

rsignell-usgs commented 10 years ago

@ocefpaf, this is awesome. We should use this to start removing dependencies we don't want, and to identify conda modules we would like to build. I for one would like a simple conda install that would run all notebooks

ocefpaf commented 10 years ago

I for one would like a simple conda install that would run all notebooks.

That is my point. I think that most users will want that. Even if it means to install all of PyPI (46170 packages BTW, not 10 billion :wink:).

From a developer point of view I understand that having just one requirement file is frowned upon. However, I believe that there is a burden on the developers to maintain separate, up-to-date, and correct requirements.txt files.

Right now it seems that people are copying and pasting the requirement file from one another, and not starting new a virtualenv/conda_env that actually has the proper requirements (generating the file from there). Doing this the "right" way is even a bigger burden on the developers that we probably do not want.

I understand that this decision should not be based only on the developer burden to create the requirements. You guys are better to judge how you want this to be based on your users.

PS: That is not a big issue. Maybe a more important issue is that most of the files do not have the version that works with the notebook as noted in #107. Generating the requirements.txt from a virtualenv/conda_env would fix that automatically.

ocefpaf commented 8 years ago

Closing this. The ioos environment file should take care of this.