Closed rsignell-usgs closed 8 years ago
I think that is a good idea. The majority of the requirements are the same and people probably have just one environment to run all notebooks.
I don't think this is a good idea... we shouldn't make someone install 10 billion things to run a simple notebook.
@kwilcox I have the impression that the system-test notebooks have several common requirements. If that is not the case then you are right.
Maybe a more important question is if people that will use these notebooks will run one or more of these notebooks. If they will run more notebooks, do they want to isolate each notebook in a new venv?
I am not against this. I actually do it myself. However, most people do not want to go through all that just to run a notebook.
In my experience people want a one installation that just works. Having one system-test venv, installed from one requirement file, to run all the notebooks they can find here should be good enough for most users.
@dpsnowden, what say ye? Perhaps best would be both requirements at the individual notebook level, and then aggregate these requirements to get the requirements necessary to run all the notebooks. Then we could see whether we have 14 packages we need to install, or 10 billion. ;-)
Agree with the latest @rsignell-usgs comment. It would be nice to know if there is a configuration that works most of the time.
Trying to sort these (or print what dir they're in, even) is beyond my shell ability, but here are all the packages used for pip.
~/dev/code/system-test -> paste **/pip-requirements.txt | column -t
pandas cython cython cython cython cython pandas
numpy numpy numpy numpy numpy numpy numpy
matplotlib scipy scipy scipy scipy scipy matplotlib
OWSLib pandas pandas pandas pandas pandas OWSLib
netCDF4 matplotlib matplotlib matplotlib matplotlib matplotlib netCDF4
csv OWSLib OWSLib OWSLib OWSLib OWSLib geojson
re netCDF4 netCDF4 netCDF4 netCDF4 lxml Shapely
cStringIO lxml lxml lxml lxml pyoos git+https://github.com/wrobstory/folium.git#egg=folium
urllib2 pyoos pyoos pyoos pyoos pyshp rdflib
parser pyshp pyshp pyshp pyshp Pillow
pdb pyke Pillow Pillow Pillow pyke
random Shapely pyke pyke pyke Shapely
datetime biggus Shapely Shapely Shapely biggus
pylab prettyplotlib biggus biggus biggus prettyplotlib
SPARQLWrapper prettyplotlib prettyplotlib prettyplotlib
zipfile
lxml
pykml
json
Modifying the @daf's shell command to:
ls **/pip-requirements.txt | rev| cut -f2 -d"/" | rev | awk 'BEGIN { ORS = "\t" } { print }' && echo && paste **/pip-requirements.txt
Copying the output to the clipboard and then:
from pandas import Series, value_counts, read_clipboard
df = read_clipboard() # The columns names are the directories where they are from.
print('\n'.join(Series(df.values.ravel()).dropna().unique().tolist()))
# Common requirements:
pandas
cython
numpy
matplotlib
scipy
OWSLib
netCDF4
csv
geojson
re
lxml
Shapely
cStringIO
pyoos
git+https://github.com/wrobstory/folium.git#egg=folium
urllib2
pyshp
rdflib
parser
Pillow
pdb
pyke
random
datetime
biggus
pylab
prettyplotlib
SPARQLWrapper
zipfile
pykml
json
# Occurrence histogram (7 requirements.txt were found.)
df.apply(value_counts).sum(axis=1).plot(kind='bar', legend=False)
Some comments:
pylab
should be removed from requirements and imports!
re
, random
, csv
, pdb
, cStringIO
, datetime
, parser
, urllib2
, zipfile
and json
are in the standard library (at least for python 2.7) and do not need to be in the requirements.
Prefer BytesIO
to cStringIO
and urllib/requests
to urllib2
that will make the notebook python 3k compatibility easier in the future.
If we remove the standard libs requirements, most of the packages are used at least in 4 out of 7 notebooks. With the exception of SPARQLWrapper
, folium
, geojson
, pykml
, and rdflib
. (But I think that the folium
count is wrong.)
We are so early on in the creation of notebooks... I'd rather just sit on this until we have a better idea of the audience for the notebooks and what the final list of requirements looks like.
What problem are we really trying to solve by standardizing the environment? Are people still having trouble staying current? I know that I haven't tried so I'm not a good judge. We do have a new project participant @duncombe-noaa who will be trying to run these notebooks in order to get spun up on the project. I hope that he can test our documentation and give us an outsiders perspective on whether we're doing ok or if we have a mess on our hands.
@ocefpaf, this is awesome. We should use this to start removing dependencies we don't want, and to identify conda modules we would like to build. I for one would like a simple conda install that would run all notebooks
I for one would like a simple conda install that would run all notebooks.
That is my point. I think that most users will want that. Even if it means to install all of PyPI (46170 packages BTW, not 10 billion :wink:).
From a developer point of view I understand that having just one requirement file is frowned upon. However, I believe that there is a burden on the developers to maintain separate, up-to-date, and correct requirements.txt
files.
Right now it seems that people are copying and pasting the requirement file from one another, and not starting new a virtualenv/conda_env
that actually has the proper requirements (generating the file from there). Doing this the "right" way is even a bigger burden on the developers that we probably do not want.
I understand that this decision should not be based only on the developer burden to create the requirements. You guys are better to judge how you want this to be based on your users.
PS: That is not a big issue. Maybe a more important issue is that most of the files do not have the version that works with the notebook as noted in #107. Generating the requirements.txt
from a virtualenv/conda_env
would fix that automatically.
Closing this. The ioos environment file should take care of this.
Should we have common requirements for all of the system test so that any notebook could be run? Currently we have pip and conda requirements at the lowest directory level.
Sounds like @Bobfrat is in favor, and I think I'm in favor also.