scidash / sciunit

A Python framework for test-driven validation of scientific models.
MIT License
48 stars 33 forks source link

Ratio Score, score type cannot handle legitimate cases where observation and prediction are of different sign (+,-). #172

Open russelljjarvis opened 3 years ago

russelljjarvis commented 3 years ago

Background:

Ratio Score score type cannot handle legitimate cases where observation and prediction are of different sign (+,-).

In optimization, when fitting with sweep traces, RatioScore is a more appropriate score choice than ZScore since number of observations is n=1.

@rgerkin

if observation and prediction are of different sign, add the absolute value of the negative one to the positive one.

Pseudo code attempted solution:

observation = -1
prediction = 1
new_observation = np.abs(observation) + prediction
new_observation = 2
intended_score = 1/2 

observation = 1
prediction = -1
new_observation = observation + np.abs(prediction)
new_observation = 2
intended_score = 1/2 actual score 2.

* this approach lead to poor optimization, I think because,
I didn't force score to be 1/2 instead of 2.0 for each of the different cases.

Also in optimization lower scores are better, in this context lower sciunit score with get worse with greater distance from 1.0. I need to make sure that is reflected somehow in a derivative sciunit score I can use with optimization.

russelljjarvis commented 3 years ago
class RatioScore(Score):
    """A ratio of two numbers.

    Usually the prediction divided by
    the observation.
    """

    _allowed_types = (float,)

    _description = ('The ratio between the prediction and the observation')

    _best = 1.0  # A RatioScore of 1.0 is best

    _worst = np.inf

    def _check_score(self, score):
        if score < 0.0:
            raise errors.InvalidScoreError(("RatioScore was initialized with "
                                            "a score of %f, but a RatioScore "
                                            "must be non-negative.") % score)

    @classmethod
    def compute(cls, observation: dict, prediction: dict, key=None) -> 'RatioScore':
        """Compute a ratio from an observation and a prediction.

        Returns:
            RatioScore: A RatioScore of ratio from an observation and a prediction.
        """
        assert isinstance(observation, (dict, float, int, pq.Quantity))
        assert isinstance(prediction, (dict, float, int, pq.Quantity))

        obs, pred = cls.extract_means_or_values(observation, prediction,
                                                key=key)
        if obs<0 and pred>0:
            obs = np.abs(obs)
            obs = obs + pred

        if obs>0 and pred<0:
            pred = np.abs(pred)
            pred = pred + obs

        value = pred / obs
        value = utils.assert_dimensionless(value)
        return RatioScore(value)
russelljjarvis commented 3 years ago

The only introduced code is as follows:

        if obs<0 and pred>0:
            obs = np.abs(obs)
            obs = obs + pred

        if obs>0 and pred<0:
            pred = np.abs(pred)
            pred = pred + obs

This code as is leads to poor optimization results.

Proposed change:

        if obs<0 and pred>0:
            obs = np.abs(obs)
            obs = obs + pred
            value = pred/obs

        if obs>0 and pred<0:
            pred = np.abs(pred)
            pred = pred + obs
            value = obs/pred

  value = utils.assert_dimensionless(value)
  return RatioScore(value)
rgerkin commented 3 years ago

Whatever change we make, ideally it would produce scores which a) do not have discontinuities and b) which monotonically decrease as the predicted and observed value move apart from each other, for any observed value.

I propose that we add a RelativeErrorScore which divides the difference between observation and prediction by a reference constant with the same units. The reference constant can be a property of the test, and can be derived (in NeuronUnit, for example) from the test's own units (e.g. if the units are MOhms, it could be 30 MOhms).

rgerkin commented 3 years ago

@russelljjarvis I just committed 4584a7d44c0007f9be059f93c681d1468a73a214 to the dev branch, which adds RelativeDifferenceScore. Try it out.

russelljjarvis commented 3 years ago

Relative difference score does not cause any syntax problems but it leads to less effective optimization than ZScore for reasons that I don't understand.

I can demonstrate sub standard optimization in a NU unit_test in the new optimization branch of NU that is pending pull request if you want?

rgerkin commented 3 years ago

@russelljjarvis Yes, I'd like to understand where Relative difference score is failing. Can you create an example in the NeuronUnit repository (I want to keep the neuro-specific examples out of the sciunit repo)? If the example turns out to be a general flaw in RelativeDifferenceScore itself then we can continue the discussion about that here.

russelljjarvis commented 3 years ago

@rgerkin Here are the tests in my feature branch.

Notebook graphical tests

Same thing without graphics

They are running over CI. I can try to get them to run on scidash travis by doing a PR as the optimization pull request to scidash passed unit testing on scidash travis. My overall impression is that Relative Difference Score can be lowered, and it works with optimization, but it is also less effective.

russelljjarvis commented 3 years ago

The notebook now shows that they both effectively work, but the ZScore optimization is faster and can get better matches with fewer NGEN/MU.