Closed russelljjarvis closed 6 years ago
@russelljjarvis Remember that we are dealing with observations that have a mean and a standard error, and sometimes this standard error is very large. So the ratio you get might be 0.5, but it might still be well within the confidence interval because the literature reports values that vary by more than a factor of 2. So in terms of the resulting model "being realistic" it should be a concern as long as abs(Z)<2.
However, given that there are not so many tests to optimize on, I would think it should be possible to drive the model towards Z=0 (and a ratio of 0) on many of them at once. So one thing to check its whether you can get something close to Z=0 (and a ratio of 0) when you run suites containing only one test at a time. Then also check the exhaustive search and make sure that the optima you are getting one running many tests in a suite is reasonable. In other words, make sure that you cannot manually change the values in any way and improve upon the fit, driving one or more Z-scores towards 0 while not pushing any others away from 0.
I can see that it could be that optimizing over 10 criteria, is too much of a conflicted priority.
@russelljjarvis Any other thoughts on this matter? Or shall I close this?
@russelljjarvis Any questions or comments?
Is a prediction that is less than a power off 10 order of magnitude fit from observation considered good enough?
Generally all the predictions are inside the interval:
[10*observation, 0.10*observation]
, but they don't seem to make close fits inside of that interval.Below are some examples of
sciunit
scores that I am unsure about. The tables could both be confirming thatsciunit
score.sort_key are a good indicator of agreement or a bad indicator. It all depends on whether the of observations and agreements that are the same order of magnitude are good enough.For example in the Input Resistance Test. Observation and Prediction are 7.49472110^7 Ohm different, this seems like a large difference to me, but perhaps being within the an interval that is `[10observation, 0.10*observation]` is actually good?
If so that would also explain the time constant test where a difference of 12ms in time constants is good because its a prediction that falls inside [10observation, 0.10observation]
And likewise a 'CapacitanceTest', where the prediction falls inside [10observation, 0.10observation] is also good.
Generally Resting Potential tests resulted in very good fits/agreement however. [1.25observation, 0.75observation],
The source code for generating these results is here:
https://github.com/russelljjarvis/neuronunit/edit/dev/unit_test/data_driven_software_tests/test_tests.py
taking the absolute log of the differences, helps scale the errors that are within the right power of 10 (in the interval
[10*observation, 0.10*observation]
), but are still bad fits.