Open russelljjarvis opened 2 years ago
@russelljjarvis OK, so the main idea here is that you would like for the SciUnit model being judged, and all its methods being called, to be a population of smaller, possible heterogeneous models. In the concrete example of neuron simulations, the population is a population of neurons.
As you may know, you can already set Test.prediction
directly and then when calling Test.judge(model, cached_prediction=True)
it will use that stored prediction rather than compute a new one. This could be made even simpler by making Model.get_prediction
lookup from (for example) Model._prediction
directly with getters/setters as you suggest. This is much more general than just having population models, as there are arbitrary reasons why one might want this functionality; also see related examples of caching here and through the end of that file.
Feature extraction could also be handled outside of SciUnit, and the prediction
set by the external tool would include the extract features rather than something raw like a simulated trace.
What I might be misunderstanding is whether there is any specific role for the notion of populations in SciUnit or not. Since in this scenario the only thing left for SciUnit to do is to compare predictions and observations, then the corresponding Test.compute_score
would just operate on arrays of predictions and observations rather than single observations and predictions, and return an array of scores (or a ScoreArray
indexed by individual models in the population). So this might require a Test.compute_scores
which explicitly takes prediction and observation arrays and returns score arrays, and internally runs map
or dask.map
to vectorize that process.
Would this do the trick?
"scenario the only thing left for SciUnit to do is to compare predictions and observations, then the corresponding Test.compute_score would just operate on arrays of predictions and observations rather than single observations and predictions, and return an array of scores (or a ScoreArray indexed by individual models in the population). So this might require a Test.compute_scores which explicitly takes prediction and observation arrays and returns score arrays, and internally runs map or dask.map to vectorize that process."
-- Yes! I totally agree with this. In the end I think I bypassed models completely and just did this (with observations, and predictions settable you can bypass models). Using StaticModels+dask seemed cumbersome, and it meant more lines of code than necessary.
The only problem is the sciunit philosophy is very model orientated up until now, in that I am not aware of any documentation where scores are obtained in the absence of models.
Below is how I used to do things, but I think its probably not memory efficient to create a heap of original SciUnit model objects, and SciUnit score objects.
A lookup cache might work. As you suggest generating a list of predictions/features outside sciunit is often the more pragmatic choice.
._prediction
, or .set_prediction
).It's 3 where the problem is.
The choices are:
(a) Make a low cost list of StaticModel
s, these are glorified containers for features (perhaps they could even be called "FeatureContainer")?. Optionally model attributes/parameters could be stored in the static models too. These are not runnable models, they just exist to satisfy NeuronUnits appetite to run judge on models.
(b) Put a lot of effort into engineering SciUnit Runnable Population models, where everything about the model that was scored is kind of re-runnable, and stored in models.
I have used the 3a approach successfully in the past, it's flexible, the main problems are:
Related to half merged PR: (see https://github.com/scidash/sciunit/pull/173).
Although, sciunit has great existing methods for scoring collections of models. When neuron network simulators are applied to the task of single-neuron model optimization, 3 notable simulators (PyNN, NEST, SNN.jl [and Brian2]) make it more convenient to model a population of cells (N>1), than a single cell (N=1). Indeed the noted simulators appear to be byte code optimized for evaluating a cell population.
Note there are pre-existing alternative, implementations that solve this problem for example the Numba JIT adaptive exponential and Izhikevich models, mean that you can avoid using network level simulators to run single cell models.
In the context of Genetic Algorithm Optimization a NEST neuron population can conveniently map straight onto a GA chromosome population, so in theory NeuronUnit+GA should be well suited to exploit the efficiency of network simulators to evaluate whole populations.
Unfortunately in practice NEST and PyNN sim platforms are a bad match for the current design pattern of Sciunit/NeuronUnit/BluePyOpt. To clarify collecting models by putting them inside a sciunit model collection container is essentially a separate and different task to collecting models in a NEST/PyNN/SNN.jl population.
To confound matters, very many so-called Python neuron simulators are not Python native. When trying to store collections of external code sim models inside sciunit containers, memory problems ensure, byte code efficiency is poor, and parallel distributed methods are very unlikely to work.
The problem is PyNN and NEST have their own preferred and safe ways to parallelize population model evaluations, but when you try to contain many single cell models (in a sciunit collection) and distribute this code (using dask, which is not what NEST expected). Relative to the external simulator, sciunit container class seems foreign and unsuited, distributing the sciunit container using
dask
in python is not safe because of nested parallelization. PyNN and NEST don't really support modelling a single cell so much as creating cell populations of size N=1.I propose a second design of sciunit model collections called SimPopulationModel. In the second design a simulator population is simply contained by a new sciunit class, and it inherits sciunit attributes (RunnableModel) etc.
In the SimPopulationModel sciunit class, model predictions and observations are just regular getter and setter based attributes that have been decoupled from sciunit generate_prediction/extract_feature methods. This means simulator users are completely free to get generate predictions/features, independently from the sciunit infrastructure. This is again necessary to exploit the efficiency of NEST to act on populations fast, which will mean that features are obtained in parallel in a highly NEST/PyNN/SNN.jl specific way.
By the time SimPopulationModel is ready to be judged all the computationally heavy work of feature extraction has been completed, and sciunit is free to do what it does best as predictions (features) and observations have now been provided by other infrastructure.
This method expects users to have pre-existing code that performs something like judge generate prediction. Prediction is then just treated as a settable object attribute, which can be updated. Doing things this way is more amenable to high throughput feature generation. Ie if EFEL generates 100 new features per model.
@all-contributors please add @russelljjarvis for ideas