Use Allen Features to optimize

russelljjarvis commented 4 years ago

This is mostly done there are about 40 features including waveform shape and spike timing. Optimizing would need to employ a sweep based approach to exploring current injections to find models with low error. Alternatively can optimize a cell relative to just one sweep, but have rheobase multiple as a gene.

rgerkin commented 4 years ago

I think behavior may be different in different sweeps, so probably better to use multiple sweeps.

russelljjarvis commented 4 years ago

I build an observation from arbitrarily choosing one of many possible Allen data sweeps.

On writing this, I just realized that I can use all the sweeps in the data, without performing sweeps on the model. I can compare model outputs to every sweep in the observation data at first, and then on future model iterations to focus on the subset of sweeps that cause the least error. If I don't compare against all sweeps at first, then I am forced to make an arbitrary choice that will result in bias.

russelljjarvis commented 4 years ago

I have thought of a good solution: Reappropriate the Rheobase algorithm to search for any target spike number. Just make sure that the model output has the same spike count as the sweep you are optimizing against for the same recording duration.

rgerkin commented 4 years ago

That is clever. But it also assumes that spike number is the correct criterion for to match on. The alternative is for the the current injection amplitude to be the criterion to match on. If at X pA, the model has 3 spikes and the data has 8 spike, the input resistance of the model can always be lowered (through some model parameter) to get to ~8 spikes. Shouldn't the optimizer be able to find this automatically?

russelljjarvis commented 4 years ago

I agree, that was my first thought too, but I was finding using current injection values model versus experiment lead to bad matches on its own. What about using both?

russelljjarvis commented 4 years ago

The optimizer could find spike number almost automatically if I make spike count an error criterion to be optimized against. That is actually a good idea and involves less re-writing of code: Derive an error from a spike count.

rgerkin commented 4 years ago

Perfect. Make the error measure for spike count something like error = logistic(abs(sqrt(n_spikes_data) - sqrt(n_spikes_model))), which will range from 0 (best, no difference) to 1 (infinite difference). The sqrt stabilizes the variance across a range of spike counts. For example, getting 1 spike when 0 is expected is a bigger difference than 21 spikes when 20 were expected, so a simple difference isn't a good measure, but neither is a ratio (1/0 is infinity). The difference of square roots is a good balance and used in some other applications. The logistic(abs(x)) then changes the range from (-inf, inf) to (0,1).

These can later become sciunit score types but for now you can just do the whole calculation in compute_score and use a FloatScore or something like that.

russelljjarvis commented 4 years ago

Okay, will do that now.

russelljjarvis commented 4 years ago

There is also a question of how to assign mean and standard deviation to the allen sweep data observations. To prototype I set standard deviation to 1.0. I don't know if each sweep is repeated per cell, with only n=1, I am not sure what else I can do. The problem then becomes, using the AllenData this way looks like

rgerkin commented 4 years ago

If you are fitting a model for that cell in particular (and not a mean across cells), and there is only one sweep with those parameters (e.g. current injection amplitude) then you probably shouldn't use a standard deviation at all, so instead of a ZScore you can use a RatioScore or something else. If each sweep is repeated you can use the standard deviation across sweeps with ZScore. If you insist on using ZScore even with one sweep a better "placeholder" standard deviation might be the sqrt(value) for positively valued things, or sqrt(value + offset) if value can be zero, such as for spike count, where offset is something like 1.

russelljjarvis commented 4 years ago

I recently tried using RatioScore I believe it throws errors at extreme values. ZScore seems to be more robust at around extreme values, thus I keep returning to ZScore. Example stack trace

~/git/safe/neuronunit/neuronunit/optimisation/optimization_management.py in bridge_passive(package)
   2179     except:
   2180         print('Ratio score failed')
-> 2181         score = t.compute_score(t.observation, pred)
   2182 
   2183     model = new_model(dtc)

~/git/safe/neuronunit/neuronunit/tests/passive.py in compute_score(self, observation, prediction)
    278         else:
    279             score = super(CapacitanceTest, self).compute_score(observation,
--> 280                                                                prediction)
    281         #import pdb; pdb.set_trace()
    282         return score

~/git/safe/neuronunit/neuronunit/tests/passive.py in compute_score(self, observation, prediction)
    159         else:
    160             score = super(TestPulseTest, self).\
--> 161                         compute_score(observation, prediction)
    162         return score
    163 

/anaconda3/lib/python3.6/site-packages/sciunit/tests.py in compute_score(self, observation, prediction)
    212                                       % self.name)
    213         # After some processing of the observation and the prediction.
--> 214         score = self.score_type.compute(observation, prediction)
    215         return score
    216 

/anaconda3/lib/python3.6/site-packages/sciunit/scores/complete.py in compute(cls, observation, prediction, key)
    146         value = pred / obs
    147         value = utils.assert_dimensionless(value)
--> 148         return RatioScore(value)
    149 
    150     @property

/anaconda3/lib/python3.6/site-packages/sciunit/scores/base.py in __init__(self, score, related_data)
     20             related_data (dict, optional): Artifacts to store with the score.
     21         """
---> 22         self.check_score(score)
     23         if related_data is None:
     24             related_data = {}

/anaconda3/lib/python3.6/site-packages/sciunit/scores/base.py in check_score(self, score)
     67             raise InvalidScoreError(self._allowed_types_message %
     68                                     (type(score), self._allowed_types))
---> 69         self._check_score(score)
     70 
     71     def _check_score(self,score):

/anaconda3/lib/python3.6/site-packages/sciunit/scores/complete.py in _check_score(self, score)
    134             raise errors.InvalidScoreError(("RatioScore was initialized with "
    135                                             "a score of %f, but a RatioScore "
--> 136                                             "must be non-negative.") % score)
    137 
    138     @classmethod

InvalidScoreError: RatioScore was initialized with a score of -977.070398, but a RatioScore must be non-negative.

scidash / neuronunit

Use Allen Features to optimize #227