nci / scores

Metrics for the verification, evaluation and optimisation of forecasts, predictions or models.
https://scores.readthedocs.io/
Apache License 2.0
48 stars 15 forks source link

Before declaring version 1, check names and locations of all functions #253

Open tennlee opened 4 months ago

tennlee commented 4 months ago

Once version 1 is released, it will not be possible to rename or move functions without a major version update. As a result, it is particularly important to be happy with the names and locations of functions.

We should review this with a reasonable number of people to deal with any questions or concerns before things are baked in.

nicholasloveday commented 2 months ago

The main thing is to consider renaming correlation to pearsons_correlation https://github.com/nci/scores/blob/develop/src/scores/continuous/standard_impl.py#L202

tennlee commented 2 months ago

With pearson's correlation, it's actually the only one I've ever used. But I always been aware there were others. A quick search indicates the others are in reasonably common use, so we should disambiguate. I could live with pcorrelation, or pearsons_correlation, or if we had a correlation subpackage scores.continuous.correlation.pearsons

nikeethr commented 2 months ago

When would this be used by a user: scores.processing.cdf.integrate_square_piecewise_linear? I tried the search function on the docs but couldn't find it mapped to a tutorial...

If it's a unique score or method, is there a more specific name for it?

nikeethr commented 2 months ago

scores.processing.cdf.observed_cdf is this actually specific to observations?

nikeethr commented 2 months ago

Under what scenarios would scores.probability.adjust_fcst_for_crps be used separate to scores.probability.crps*. (i.e. Is there a reason a user would have to call it separately, as opposed to it being an internal function prior to crps computations.)

Also does this fit in better in "processing"?

nikeethr commented 2 months ago

scores.probability.crps_for_ensemble I'm wondering if the for is required, I'm okay either way.

nikeethr commented 2 months ago

I concur with renaming to pearsons_correlation

nikeethr commented 2 months ago

scores.processing.cdf.decreasing_cdfs this one is hard to name. It's not that the cdfs are actually decreasing in the strict sense, its more that accumulated extents where the cdfs are decreasing, are greater than an allowed tolerance - that still only implies that the cdf is non-increasing.

I think non_increasing_with_tolerance or decreasing_with_tolerance (beyond maybe more specific than with) reads more accurately to me, but I'm not sure what a general user will think.

In fact I'd have preferred if the function output was flipped, and just said increasing_within_tolerance. Since its a "weakly increasing" criteria instead of "(strictly) non-decreasing" due to data noise. But I don't recommend a code change at this stage, unless you really want to.

I think the easiest option is to just keep decreasing_cdfs, but add a example with repl >>> showing a sample input and output...

nicholasloveday commented 2 months ago

When would this be used by a user: scores.processing.cdf.integrate_square_piecewise_linear? I tried the search function on the docs but couldn't find it mapped to a tutorial...

If it's a unique score or method, is there a more specific name for it?

The CDF processing tools generally don't have a tutorial associated with them. The CRPS tutorial could be improved in the future to incorporate them.

nicholasloveday commented 2 months ago

scores.processing.cdf.observed_cdf is this actually specific to observations?

In general, yes. Technically it could also be used on a single-valued forecast, but then the CRPS=MAE, and you're better off calculating MAE directly. I'm happy for this to be left as is.

nicholasloveday commented 2 months ago

Under what scenarios would scores.probability.adjust_fcst_for_crps be used separate to scores.probability.crps*. (i.e. Is there a reason a user would have to call it separately, as opposed to it being an internal function prior to crps computations.)

Also does this fit in better in "processing"?

The use case is when you produce a CDF from points on the distribution, sometimes you may not get a true CDF. See https://github.com/nci/scores/blob/develop/src/scores/processing/cdf/cdf_functions.py#L352 . This will calculate the CRPS and handle this.

I guess that it could be called by an optional arg in crps_cdf

We had some previous discussion as to if it should go in processing. From memory the consensus was no since it computes a verification score

nikeethr commented 2 months ago

Btw I'm happy as long as there's a reasonable explanation to my comments, but will suggest that we have more examples down the track (either via inclusion in tutorials or in the api docstring). Since I may not be the only one who finds how/when to use them ambiguous.

tennlee commented 2 months ago

I believe we have resolved the essential points for version 1. I have altered the milestone on this issue for version 1.2 so as not to lose the discussions. I think we don't need to be overly concerned - if things need to change, we just need to adjust the version flag accordingly, I just wanted to make sure we were making considered choices.