scidash / neuronunit

A package for data-driven validation of neuron and ion channel models using SciUnit
http://neuronunit.scidash.org
38 stars 24 forks source link

Implement all Druckman Measurements #190

Closed rgerkin closed 5 years ago

rgerkin commented 6 years ago

Implement tests for all 38 of the features used in Druckman et al, shown here: https://academic.oup.com/view-large/80919724

JustasB commented 6 years ago

Full list:

The details of each test are here: bhs290supp.pdf

JustasB commented 5 years ago

These have been implemented and tested with all NeuroML-DB.org cell models.

All tests can be found here: https://github.com/scidash/neuronunit/blob/dev/neuronunit/tests/druckmann2013.py

russelljjarvis commented 5 years ago

I wonder if some of the tests are dependent on others, and if so would would a dependency hierarchy look like with ascii art?

For example in the 8 tests I use to optimize, 4 are spike measurements, and they depend on knowing the rheobase current injection.

I think that such signal-processing dependencies need to be reflected in the OOP design of NU somehow. Ie many tests inherit from a base VMtest, if a subset of these tests depend on rheobase being known, this subset should inherit from RheobaseTest too.

For example:

                       VmTest
                         |
                    RheobaseTestP
      /                  |                   \
width                Amplitude                 AP-threshold 

It's a great idea to hook these tests up to the optimizer could be a good way to reveal some possible hidden bugs in test implementation (if such bugs really exist at all, I am not saying they do). This is just because a genetic algorithm can stoch-astically explore quiet a large parameter space, thereby strain testing a lot of signal processing steps.

For example, does AP2 depend on AP1? I was thinking AP2 might belong to a spike train consisting of at least 2 spikes.

JustasB commented 5 years ago

I think what you describe is a perfectly fine way to design a class hierarchy.

@russelljjarvis Take a look at the class diagram for these tests:

druckmannClassUML.zip

When designing the inheritance hierarchy, I was working with the following challenges:

My solution was to use inheritance whenever a test was a variation on an earlier test, or composition of test instances when multiple previous tests were used.

russelljjarvis commented 5 years ago

Okay, that's a good UML diagram. I can see it is already very hierarchical, that was something I did not appreciate straight away.

I am guessing that all of these tests are run with the NEURON simulator backend? If so, I should make sure I can run them, with the parallel Rheobase code. I wonder if you have dask in your Ubuntu environment?

The main issue that comes up in the context of my NU testing work, is sometimes extreme models I generate have waveform including nan, -inf and, inf.

I have a line of code that returns None type predictions on such model evaluations: https://github.com/russelljjarvis/neuronunit/blob/dev/neuronunit/tests/passive.py#L28

Since this issue is probably going to come up in a lot of places, I wonder if you think there could/should be a function decorator/syntactic sugar, or a class method that does this automatically?

The way I have previously managed, this is to re-implement nan, and inf checking, in every NU test class, but I suspect that is a bad idea.

There was a previous bug in the RheobaseP search algorithm, were the search terminated prematurely if supra.min() and sub.max() where too close together. This has since been updated and handeled in a different way. If you were to use the RheobaseP class you would probably need to set the backend to NEURON, if that's what you are using.

Hopefully, I can help to do this on the hackathon, I suspect, these tests, should help me to make my optimization code more generalizable. https://github.com/russelljjarvis/neuronunit/blob/d8253dae953bbbd42a8c056eb9f49b60780d51d3/neuronunit/tests/fi.py#L217-#L220

I have updated the file over the last two days.

russelljjarvis commented 5 years ago

I have been playing around, with using the Druckman test suite with the optimizer. As I think its really good to have all of these new NU tests available.

I have noticed that there is a class that may need instancing (note I renamed the file druckmann2013.py as just dm. So I do this:

from neuronunit.tests import dm 
dmtests = dm.Druckmann2013Test()

but I need to supply a current amplitude.

TypeError: __init__() missing 1 required positional argument: 'current_amplitude'

Is this current amplitude some multiple of rheobase (like 1.5)? I have noticed that many Druckman tests operate on multiple spike waveforms. I will recommend some minor changes in a pull request over the hackathon.

rgerkin commented 5 years ago

@russelljjarvis @JustasB I agree this will be a great thing to test and iron out during the hackathon.

JustasB commented 5 years ago

For input resistance, the parameter should suppy an array of at least two negative currents.

For tests that have "StrongStim" in their name, the parameter should be 3x the rheobase current.

For all other tests, the parameter should be 1.5x the RB

russelljjarvis commented 5 years ago

In practice I found 1.5* rheobase was not always enough current to get multispiking behavior needed for ISI and CV tests.

Also @JustasB what is your opinion about chattering or bursting neurons. Neurons, that are either silent, or fire a burst, but they never fire exclusively once. Should they be eligible to take these tests too?

The reason I ask, is my rheobase search algorithm will exclude such models, and either excluding 'no-rheobase' or accepting such models, could be important to code design.

rgerkin commented 5 years ago

@russelljjarvis As previously written, your code would look for the minimum current to evoke exactly one spike, but as you may recall we fixed this a few months ago to be the minimum current to evoke at least one spike (or at least it was fixed in one branch at some point). So it should work for all models that are capable of spiking and capable of not spiking, even if there is a discontinuity from, say, 0 spikes to 10 spikes.

Whether this is the kind of model neuron we are dealing with is handled by another test (Threshold Firing Test), which should probably be rewritten to just reuse the F/I data from the Rheobase test to save compute time.

russelljjarvis commented 5 years ago

I have some good DM test integration. https://github.com/russelljjarvis/neuronunit/blob/dev/neuronunit/unit_test/druckman_tests.ipynb here.

I will try to attach the tests to the optimizer.

@JustasB, so far from your code, I have been able to generate DM Test predictions, using a forward euler version of the Izhikitch model.

Do you have any ideas about how to use these predictions, to compare to observations, enabling us to derive optimal models, with respect to observations? In other words how can I generate an error function?

JustasB commented 5 years ago

I'm thinking we would need either experimental data from real cells to compare against models, or values obtained from complex models that can then be used to compare to reduced models.

russelljjarvis commented 5 years ago

Both lists contain multiple measurements of AHP, they also both have tests of accomodation, and adaption, which might be the same thing.

Do you think that depth and amplitude of AHP refer to the same concept? If so Druckmann predictions of AP1AHPDepthTest(standard), AP2AHPDepthTest(standard) might be equal to the neuroelectro AHP Amplitude?

This is all the Druckman tests you have implemented:

AP12AmplitudeDropTest(standard), 
    AP1SSAmplitudeChangeTest(standard), 
    AP1AmplitudeTest(standard), 
    AP1WidthHalfHeightTest(standard), 
    AP1WidthPeakToTroughTest(standard), 
    AP1RateOfChangePeakToTroughTest(standard), 
    AP1AHPDepthTest(standard), 
    AP2AmplitudeTest(standard), 
    AP2WidthHalfHeightTest(standard), 
    AP2WidthPeakToTroughTest(standard), 
    AP2RateOfChangePeakToTroughTest(standard), 
    AP2AHPDepthTest(standard), 
    AP12AmplitudeChangePercentTest(standard), 
    AP12HalfWidthChangePercentTest(standard), 
    AP12RateOfChangePeakToTroughPercentChangeTest(standard), 
    AP12AHPDepthPercentChangeTest(standard), 
    AP1DelayMeanTest(standard), 
    AP1DelaySDTest(standard), 
    AP2DelayMeanTest(standard), 
    AP2DelaySDTest(standard), 
    Burst1ISIMeanTest(standard), 
    Burst1ISISDTest(standard), 
    InitialAccommodationMeanTest(standard), 
    SSAccommodationMeanTest(standard), 
    AccommodationRateToSSTest(standard), 
    AccommodationAtSSMeanTest(standard), 
    AccommodationRateMeanAtSSTest(standard), 
    ISICVTest(standard), 
    ISIMedianTest(standard), 
    ISIBurstMeanChangeTest(standard), 
    SpikeRateStrongStimTest(strong), 
    AP1DelayMeanStrongStimTest(strong), 
    AP1DelaySDStrongStimTest(strong), 
    AP2DelayMeanStrongStimTest(strong), 
    AP2DelaySDStrongStimTest(strong), 
    Burst1ISISDStrongStimTest(strong),
    Burst1ISIMeanStrongStimTest(strong)]

... and this is a list of all the neuroElectro electro-physiological ontologies:

cell capacitance
input resistance
resting membrane potential
membrane time constant
spike amplitude
spike half-width
spike threshold
rheobase
firing frequency
AHP duration
cell diameter
access resistance
sag ratio
cell surface area
AHP amplitude
FI slope
spontaneous firing rate
fast AHP amplitude
fast AHP duration
slow AHP amplitude
slow AHP duration
spike width
ADP amplitude
ADP duration
spike peak
adaptation ratio
first spike latency
other
spike rise time
spike decay time
spike max rise slope
spike max decay slope
sag amplitude
maximum firing rate
medium AHP amplitude
medium AHP duration
spike amplitude from resting
spike amplitude from trough
AHP voltage
fast AHP voltage
slow AHP voltage
medium AHP voltage
AHP amplitude from resting
fast AHP amplitude from resting
slow AHP amplitude from resting
medium AHP amplitude from resting
adaptation ratio (last/first ISI)
adaptation percent (first/last ISI)
adaptation percent (last/first ISI)
adaptation ratio (1 – first/last ISI)
russelljjarvis commented 5 years ago

I think we should rename the file Druckman measurements. A different file that merges the measurements with appropriate falsifying data would be called Druckman tests.

russelljjarvis commented 5 years ago

I wouldn't say this is done, as the tests are still measurements, but they could be upgraded to tests.

rgerkin commented 5 years ago

@russelljjarvis Which file are you talking about?

russelljjarvis commented 5 years ago

neuronunit/neuronunit/tests/druckmann2013.py

My mistake, it's not the file that would need renaming but this GitHub issue.

Issue could be renamed upgrade some druckmann measurements to test classes.

rgerkin commented 5 years ago

I renamed the issue, but I'm confused about the other part. That file already contains many test classes, does it not? For example, in the dev branch, AP12AmplitudeDropTest subclasses Druckmann2013Test subclasses VmTest.

russelljjarvis commented 5 years ago

Yes, but I think the measurements are just inheriting useful class properties, like generate_prediction etc. I don't think there is any sciunit/neuronunit level testing going on, and I think it's important to be unambiguous with language, in these contexts.

rgerkin commented 5 years ago

@russelljjarvis OK, you're probably right, I may not have looked closely enough.
@JustasB I know you're just using parts of the test infrastructure here -- what do you propose these be called? I recognize this is sort of duplicating the same question across issues, but I'm hoping the "answer" will become clear if we see the question in different contexts.

russelljjarvis commented 5 years ago

I think you could even reopen the issue, but change the name again, to 'upgrade Druckmann measurements to tests'. The issue was initially an announcement, that the measurements were implemented, but the issue changed in nature, to instead reflect, given that the measurements exist how to convert them to tests.

rgerkin commented 5 years ago

I think this will be handled in the implementation of #75.

JustasB commented 5 years ago

They seem to squawk, walk, and smell like tests to me -- so I think they're tests. If they're included in a suite, and provided with data to compare against, they'll return test results just like the other tests. What am I missing?

russelljjarvis commented 5 years ago

Maybe its just a semantic thing, but I feel like a test, that other people can pick up and start using, is a package that includes the data, and the generate_prediction method together. That is the case for the other NU tests I have been using. If the data is not included a method for downloading it is.

I think the only missing ingredient is the data or the methods for downloading and aligning the appropriate data.

Say if someone evaluates a model using the whole list of Druckman measurements, under the assumption that they are currently NU tests, they will not understand why no sciunit scores are returned by the process.

rgerkin commented 5 years ago

But VmTest (which these tests subclass) does implement the methods for getting data (from NeuroElectro), so what do these Druckmann test classes not have that others do? Tests were called tests in NeuronUnit long before get_neab.py.

russelljjarvis commented 5 years ago

Okay, so what should you call packaged tests then? Like if you wanted to use language really precisely to avoid ambiguity. A packaged test would be one that is ready for user consumption. The way I think of the Druckman measurements is kind of like developer possible tests, but not the sort of thing an end user would consume.

It seems like to me you would choose the word 'test' over the word 'measurement', if the measurement also involved a comparison against another reference point. If there is no comparison, aren't you just measuring waveform features?

In the case of software unit tests, don't the unit tests generate their own falsifying data/conditions? I thought that point of word choice of test, was to reapply the unittest anology in science, but for the anology to work, the mapping between the two ideas would have to be close.

I just think this kind of ambiguity could confuse newcomers to the project.If it's cool for the test, not be a package ready to enable comparisons back to falsifying data, an alternative way to disambiguate, might be to reserve a different term that refers to the complete package of generate_prediction() against an aligned experimental observation.

rgerkin commented 5 years ago

There has never been a class for packaged tests. There are instances of tests classes, where those instances use specific data as observations, is that what you mean?

And I disagree that an end user could/would never use a Druckmann test. For example, Druckmann used them (albeit not in NeuronUnit).

russelljjarvis commented 5 years ago

I think when there exists an observation that is compared against a measurement, suddenly the object should be regarded as a test (a comparison) as opposed to just a measurement. I guess I evaluated all the druckmann tests at one point, and then I got confused. I thought I was testing, as opposed to measuring, but where were the scores? A user (me) could not optimize without scores. Maybe it has changed since then?

I am unsure if the history of variable names is always a good foundation, for variable names.

JustasB commented 5 years ago

I think there are three different aspects here: a) the stimulation protocol b) the algorithm to compute the property value and c) comparison of the value to what's seen experimentally. All tests in NU provide a and b. The user has to supply c from e.g. literature or neuroelectro.

In case of druckmann properties, those values are not available anywhere, but in theory, someone could subject real cells to the same protocols, compute the properties, and then supply their values to the druckmann NU tests for comparison. But this last act of measuring them would not change the code on NU.

russelljjarvis commented 5 years ago

I like this very clear explanation of the situation. One consequence of 'c' not being viable here, is that the analogy between a software unit test and a sciunit test, breaks down in the case of the Druckmann tests, as its not yet practically possible to check for experiment/model agreement. Perhaps Druckmann meant something semantically different by test? Or did he mean, the human eye-balls the test? I can imagine an alternative universe where VmTest, is actually be called VmProperties, and only Tests with executable science hypothesis testing methods are Sciunit Tests.

This is what has previously been written about tests: "The analogy to scientific modelling is clear and compelling: a successful scientific model should pass unit tests which encode specific empirical outcomes that investigators believe adequately characterize the experimental system. A successful model for classical mechanics, for example, should pass unit tests corresponding to the measured trajectories of fired projectiles, collisions of billiard balls and rotations of spinning tops. While the number of potential, reasonable unit tests might be vast, it may be sufficient to deploy a few suites of tests that together cover a broad number of decisive experimental results and that collectively summarize the state of empirical knowledge in the field."

https://royalsocietypublishing.org/doi/full/10.1098/rstb.2017.0381

I think only 'c' from above is semantically consistent with this meaning of test.

russelljjarvis commented 5 years ago

I was thinking more about the relationship of Druckmann (DM) tests and VmTest, to scientific unit-testing. I think the relationship is that DMTest, and VmTest are both necessary but not sufficient, to do scientific unit-testing. So a good name could also be VmTestComponents, and DMTestComponents as the word Component, embodies this necessary but insufficient character. Also I realised that I can show that the DM tests can in principal work with optimization, if data existed, by creating fake observations, from models evaluated at random data points, and to use optimizer learned knowledge of error gradients to blindly recover, the random data points.

rgerkin commented 5 years ago

What is sufficient? The test plus the observed data? Because that is what you usually get when you instantiate a test (except that Druckmann and TestM2M because they are purely for generating model output and/or comparing it against other models, rather than against data)?

russelljjarvis commented 5 years ago

I think the test+observed data, is sufficient for making an executable scientific unit-test. If there are exceptions to this, I wonder if a few sentences could be included in the class doc-string, to explain to the Test user, a test class is sometimes a component of a test, where test is meant in the noun sense. Instanced tests are often executable, and then they become comparisons (test is meant in the verb sense), but not all instanced tests will allow comparisons to experimental data, those tests will still allow comparisons against other model data.

It sounds like, the DMTests, could or already do have compute score methods. If they allow for comparisons to other models, then I agree they are tests in the verb sense. I agree TestM2M is strongly related to a scientific unit-testing. Sorry it took so long for me to become clear about this issue.

Related to this issue of appropriate variable names. In TestM2M, what variable name should we use for observation? Maybe this has already been decided? Perhaps synthesized_data?

rgerkin commented 5 years ago

Since a normal Test is instantiated with data, then the resulting instance is sufficient to meet your vision of the scientific unit test.

But it is clear that we have a few cases where people are interested in using the Test's methods for something other than comparison to data. So there may be a need for a less hack-y way of doing that, such as a top-level class in SciUnit which is like a protocol. Test, TestM2M, and Justas' use case for Druckmann "tests" could all derive from that. As we work with HBP over the next year, I think we will figure out what works best. I don't want to pick the answer quite yet.

russelljjarvis commented 5 years ago

Yes, maybe the answer will emerge from continued community use and participation. The user market might decide.