ioam / topographica

A general-purpose neural simulator focusing on topographic maps.
topographica.org
BSD 3-Clause "New" or "Revised" License
53 stars 32 forks source link

Record Python & imports for non-fat version; remove external #478

Closed sf-issues closed 9 years ago

sf-issues commented 12 years ago

Converted from SourceForge issue 3513656, submitted by ceball Submit Date: 2012-03-31 13:50 GMT

Currently, we include all Topographica's dependencies (except some system libraries) in external/. When building from source, we therefore get a self-contained topographica directory for which a single version number tells you both the version of topographica as well as that of python, numpy, matplotlib, etc. Combined with the operating system info (captured by run_batch), this allows Topographica simulations specified by svn revision number to be duplicated.

However, most users install Topographica into an existing Python environment. Therefore, we don't need to spend effort maintaining the external/ directory (i.e. keeping all of the dependencies building on various different platforms). Additionally, for many users, run_batch() does not therefore capture the important information of what version of Python/NumPy/etc is being used.

We need to add code to record the versions of Python and imported packages (helpful even while we still have external/). Then, we should be able to remove external (or at least almost everything from it) and stop spending effort maintaining it.

Note about capturing dependencies: for rpm- and deb-based platforms, it should be easy generate a command allowing the python environment to be re-created.

Also: consider what third-party packages are available for this kind of thing (e.g. Sumatra?).

And ideally, we would also capture versions of system libraries (e.g. GMP) as well as the versions of Python ones (e.g. gmpy, which uses GMP).

jlstevens commented 10 years ago

The /external folder now only contains our own submodules and a legacy Makefile. This Makefile fetches (old!) packages from our Sourceforge account (as tar balls) and these packages haven't been updated in a long time.

Nowadays, we recommend everyone install dependencies via pip and we no longer support the 'fat' version of Topographica. The only other remnant of the old external build system is that buildbot still runs 'full build' which still compiles everything. I've made a note that we should consider disabling this (on the CSNG wiki page) so I can now close this issue.

jbednar commented 10 years ago

I've already made the buildbot stop building the full version, since a few months ago, so that's not an issue. But I'm reopening this because what it was about was the _non_full version, and I don't think we have yet got a good way to record all the relevant versions of external libraries.

jlstevens commented 10 years ago

Ah right - I didn't read it properly.

I still think this issue should be closed as the the key dependencies can be tracked using the topo_metadata in the Lancet extension (all submodule versions, numpy version, Python version). As long as you call this function in an IPython notebook, the state of those dependencies will be tracked.

To be honest, I don't think recording the versions of all the external libraries is necessary (or sensible). As long as we keep track the versions of our key dependencies (i.e. our own submodules, numpy and Python itself) I think we are fine.

jbednar commented 10 years ago

We definitely can't record all external dependency versions, but it would be nice to record the ones that we know might matter (gmp, weave/scipy).

jlstevens commented 9 years ago

The correct way to do this is to supply requirements.txt . I would recommend sticking this in topo.platform for those who want to use it.

jbednar commented 9 years ago

How would requirements.txt help in this case? The problem mentioned is of recording the library versions that were actually used in a run, not ones that we know beforehand are good (which is what requirements.txt normally does). The goal is to be able to reproduce results exactly later, when we find that we can't just use the current versions of libraries for some reason -- we want to know which specific ones were actually used.

Or are you proposing that we generate a requirements.txt file with every call to run_batch, which would make it simple to recreate the particular environment used in that run? That would be an interesting approach (though it doesn't help for non-Python libraries like GMP), but someone would have to write the code to generate that file. It wouldn't be some static text file in topo.platform as your comment seems to be suggesting...

jlstevens commented 9 years ago

Ah right, second time I misunderstood! We still need a requirements.txt (maybe a separate issue) but for this particular problem I use lancet to record the versions of all our major dependencies when running simulations.

jbednar commented 9 years ago

Ok. It might be cool to have a function in our Lancet support somewhere that would generate a requirements.txt file from a given set of run metadata, but that's a nicety.

In any case, if you can describe here what Lancet does for this, this issue can be closed.

jlstevens commented 9 years ago

You can collect version metadata as follows:

from topo.misc.lancext import topo_metadata
metadata = topo_metadata()

Now you can call the summary method for a nice, readable summary. For instance:

topo_metadata().summary()

Will print something like:

Topographica version control summary:

   Topographica: 504fca1 Merge pull request #597 from ceball/skiptest_from_unittest
                         [11 files have uncommited changes as captured by git diff]
   Param:        48ebddc Minor simplification
   Imagen:       2a24c15 Silenced Pyflakes warning regarding unused import
   Lancet:       64a047e Improved how the _savepath method of FileType selects a file extension
                         [1 files have uncommited changes as captured by git diff]
   Numpy:        07601a6 Version 1.9.0 (release)

Finally, if you simply call the metadata instance, you'll get a dictionary containing all this information (and much more) that can be recorded with your simulation e.g by supplying it to a lancet.Launcher via the metadata argument.

Automatically generating a requirements file could be somewhat helpful but it won't have the same commit-level granularity.

Hopefully someone will find this information useful!