Add KGE Score - Githubissues

tennlee commented 2 months ago

This incorporates the work done by @durgals to add the KGE score (Issue #678, PR #679) and additionally includes some changes I applied in response to the code review on PR #679.

@durgals the test coverage is not 100%. I have included a screenshot showing which lines do not have test coverage. I have also pushed this work to the 228 documentation branch (which we use to test the documentation) Please go through the checklist below and check each thing off.

[x] Lines (523-528) are the only ones which do not have test coverage. Please add unit tests covering these conditions and assert that the appropriate exceptions are raised.
[ ] Please visit https://scores.readthedocs.io/en/228-documentation-testing-branch/included.html and check this looks right for KGE
[ ] Please visit https://scores.readthedocs.io/en/228-documentation-testing-branch/api.html#scores.continuous.kge and check this looks right for KGE
[ ] Please visit https://scores.readthedocs.io/en/228-documentation-testing-branch/tutorials/Kling_Gupta_Efficiency.html and check this looks right to you also

Also, if you are unhappy with any of the changes I made, please feel free to say how it should be changed.

durgals commented 2 months ago

Hi @tennlee , do you have an example pytest for raising an exception?

tennlee commented 2 months ago

Hi @tennlee , do you have an example pytest for raising an exception?

Here is a link to some of the scores tests which do this: https://github.com/nci/scores/blob/8ff29eef75613bf6b27d1567898fcaec377c37c9/tests/test_utils.py#L476

Here is a link to the pytest documentation: https://docs.pytest.org/en/7.1.x/reference/reference.html#pytest-raises

Let me know if there's anything else I can do to help, and let me know if you need any explanation of the context manager syntax as well (that's the with ... part).

Thanks :)

durgals commented 2 months ago

Hi @tennlee , do you have an example pytest for raising an exception?

Here is a link to some of the scores tests which do this:

https://github.com/nci/scores/blob/8ff29eef75613bf6b27d1567898fcaec377c37c9/tests/test_utils.py#L476

Here is a link to the pytest documentation: https://docs.pytest.org/en/7.1.x/reference/reference.html#pytest-raises

Let me know if there's anything else I can do to help, and let me know if you need any explanation of the context manager syntax as well (that's the with ... part).

Thanks :)

Thanks. I will update the test. BTW, I need to checkout this kge-staging branch?

tennlee commented 2 months ago

Hi @tennlee , do you have an example pytest for raising an exception?

Here is a link to some of the scores tests which do this: https://github.com/nci/scores/blob/8ff29eef75613bf6b27d1567898fcaec377c37c9/tests/test_utils.py#L476

Here is a link to the pytest documentation: https://docs.pytest.org/en/7.1.x/reference/reference.html#pytest-raises Let me know if there's anything else I can do to help, and let me know if you need any explanation of the context manager syntax as well (that's the with ... part). Thanks :)

Thanks. I will update the test. BTW, I need to checkout this kge-staging branch?

Yes. I'm not sure exactly how you have set up your git environment, but what I would be looking for is a merge request from your fork targeting this branch as the base. An alternative would be to try to use the web GUI to add the changes into the PR but I'm personally less comfortable in the web GUI and more comfortable on the command-line.

My git environment is set up with multiple remotes so that I can easily work with other people's forks. Assuming your git environment is a clone of your fork, you can use "git remote add" to add the NCI-owned repo as an additional remote. That's what I've done. However, my environment tracks NCI by default rather than my fork.

You can check out a branch from a remote as follows:

git checkout -b my-branch-name remote/branch-name

You can then push that up to your owk fork with something like

git push origin (or another named remote)

Then you can do the normal workflow, make changes etc, and push the changes up to your fork.

Then you can make a new PR in github from your fork's branch into the NCI repo branch.

I'll then merge your changes into NCI/kge-staging, and squash the work into probably two commits rather than our usual one, and then merge to develop from there. It may sound complicated, but it works quite smoothly.

If any of that seems overly complicated, so long as the code is somewhere on your fork, I can bring it into the PR from there for you using a similar process.

durgals commented 2 months ago

@tennlee , I have looked at the API documentation(https://scores.readthedocs.io/en/228-documentation-testing-branch/api.html#scores.continuous.kge). I suggest the following for the consistency:

Formula for rho: Instead of using covariance and standard deviation, we can use same formulae used in the Pearson correlation coefficient. I think we remove the formal for rho, and update the text in where section: rho = Pearson’s correlation coefficient between observed and forecast values as defined in Pearson’s correlation coefficient (link)
change scaling factors to normal font (currently they are bold)
Parameters: preserve_dims: Information is wrong after "As a special case...". This is also true for documentation of Pearson correlation because for preserve_dims = 'all', it is not possible to compute standard deviation of each point. I think we need to raise exception when preserve_dims = 'all' or update the text saying that it returns NaN as I mentioned in the Notes.
I think we also need to use equations for scaling_factors (instead of using s_rho etc.)
return_components (bool | False). Use equations for rho, alpha and beta.
Note: currently this function is working only one DataArray
Notes: preserve_dimes: replace the text with "When preserve_dims is set to 'all', the function returns NaN, similar to the Pearson correlation coefficient calculation for a single data point because the standard deviation is zero for a single point"

durgals commented 2 months ago

Hi @tennlee , do you have an example pytest for raising an exception?

Here is a link to some of the scores tests which do this: https://github.com/nci/scores/blob/8ff29eef75613bf6b27d1567898fcaec377c37c9/tests/test_utils.py#L476

Here is a link to the pytest documentation: https://docs.pytest.org/en/7.1.x/reference/reference.html#pytest-raises Let me know if there's anything else I can do to help, and let me know if you need any explanation of the context manager syntax as well (that's the with ... part). Thanks :)

Thanks. I will update the test. BTW, I need to checkout this kge-staging branch?

Yes. I'm not sure exactly how you have set up your git environment, but what I would be looking for is a merge request from your fork targeting this branch as the base. An alternative would be to try to use the web GUI to add the changes into the PR but I'm personally less comfortable in the web GUI and more comfortable on the command-line.

My git environment is set up with multiple remotes so that I can easily work with other people's forks. Assuming your git environment is a clone of your fork, you can use "git remote add" to add the NCI-owned repo as an additional remote. That's what I've done. However, my environment tracks NCI by default rather than my fork.

You can check out a branch from a remote as follows:

git checkout -b my-branch-name remote/branch-name

You can then push that up to your owk fork with something like

git push origin (or another named remote)

Then you can do the normal workflow, make changes etc, and push the changes up to your fork.

Then you can make a new PR in github from your fork's branch into the NCI repo branch.

I'll then merge your changes into NCI/kge-staging, and squash the work into probably two commits rather than our usual one, and then merge to develop from there. It may sound complicated, but it works quite smoothly.

If any of that seems overly complicated, so long as the code is somewhere on your fork, I can bring it into the PR from there for you using a similar process.

@tennlee , I have this one:, so I need to checkout git checkout -b 678-kge remote/kge-staging?

tennlee commented 2 months ago

@tennlee , I have this one:, so I need to checkout git checkout -b 678-kge remote/kge-staging?

I would suggest using a new branch on your fork rather than re-using the 678 branch, just because of how it interacts with github's PR process.

So I would suggest git checkout -b 678-kge-staging remote/kge-staging

tennlee commented 2 months ago

@tennlee , I have looked at the API documentation(https://scores.readthedocs.io/en/228-documentation-testing-branch/api.html#scores.continuous.kge). I suggest the following for the consistency:
1. Formula for rho: Instead of using covariance and standard deviation, we can use same formulae used in the Pearson correlation coefficient. I think we remove the formal for rho, and update the text in where section: rho = Pearson’s correlation coefficient between observed and forecast values as defined in Pearson’s correlation coefficient (link)

That sounds good. Can you please go ahead and update that on your fork? I can pull across changes as you go if you like.

2. change scaling factors to normal font (currently they are bold)

I couldn't work out how to fix this issue, the source code looks right to me. If you can come up with a fix, please feel free, otherwise I think we just have to put up with it.

3. Parameters: preserve_dims: Information is wrong after "As a special case...". This is also true for documentation of Pearson correlation because for preserve_dims = 'all', it is not possible to compute standard deviation of each point. I think we need to raise exception when preserve_dims = 'all' or update the text saying that it returns NaN as I mentioned in the Notes.

That makes sense. Can you make an appropriate modification to the text in your PR and I'll review what's going on in pearsonr separately myself.

4. I think we also need to use equations for scaling_factors (instead of using s_rho etc.)

Sounds good, are you happy to go ahead and make the change?

5. return_components (bool | False). Use equations for rho, alpha and beta.

Sounds good, can you go ahead and make that change?

6. Note: currently this function is working only one DataArray

I noted that and I assumed that's what you intended. It's always nicer to make the methods work with either data structure, but so long as the type hints are correct then it's acceptable when it's inherent to a method that only particular data structures can be used. It may be worth adding a descriptive note to the docstring, but it should be clear from the typing information.

7. Notes: preserve_dimes: replace the text with "When preserve_dims is set to 'all', the function returns NaN, similar to the Pearson correlation coefficient calculation for a single data point because the standard deviation is zero for a single point"

Agreed, please go ahead and make this change. I think that's the same as your point 3 above?

tennlee commented 1 month ago

@durgals @nicholasloveday

Thanks for that PR Durga. I have merged it into this PR. So far as I know, everything is now ready to be merged. I have pushed the updates to https://scores.readthedocs.io/en/228-documentation-testing-branch/ .

Could you both please take a final look and let me know with a short message here? If you're both happy I will go ahead and merge this into develop.

Note - rather than squashing this into one single commit, I will probably organise it into two commits, but I will take care of that as part of the merge process - neither of you need to do anything about it.

nicholasloveday commented 1 month ago

I think that this is fine to merge in. Thank you for this work.

A couple of minor things to mention.

Consider removing the dots in the KGE equation (in case someone thinks that this is the dot product). I am still okay if this is merged in as is.
In the future, I think that we should try to provide more support for supporting xr.Datasets as inputs. No changes needed to this at this stage.
Did you want "are the scaling factors for the correlation coefficient" bold in the docstring?

tennlee commented 1 month ago

I have merged this into the develop branch now. @durgals can you please review @nicholasloveday 's final comments and consider if you would like to raise new issues to track them. I'm fine either way in each case.

durgals commented 1 month ago

@tennlee Thank you merging this. Regarding comments from @nicholasloveday:

Consider removing the dots in the KGE equation (in case someone thinks that this is the dot product). I am still okay if this is merged in as is. I will update this in the next round of changes, including support for xr.Dataset or weights.
In the future, I think that we should try to https://github.com/nci/scores/issues/702for supporting xr.Datasets as inputs. No changes needed to this at this stage. I’ll consider this for a future release,
Did you want "are the scaling factors for the correlation coefficient" bold in the docstring? I already reported this to @tennlee, and he mentioned that it’s a rendering issue with ReadTheDocs.

nci / scores

Add KGE Score #700