malariagen / malariagen-data-python

Analyse MalariaGEN data from Python
https://malariagen.github.io/malariagen-data-python/latest/
MIT License
13 stars 23 forks source link

Option to annotate upper-right corner of Fst plots with Z scores. #538

Closed jonbrenas closed 3 months ago

jonbrenas commented 4 months ago

Resolves #536 .

leehart commented 3 months ago

Thanks @jonbrenas . Shall I try to resolve the conflicts on this branch?

jonbrenas commented 3 months ago

Sorry, @leehart . I had not seen that fst.py had been modified in the meanwhile. I will solve the conflict.

leehart commented 3 months ago

Hi @jonbrenas I just took a quick look at the code, and I just wanted to clarify how the option should work, if that's OK?

I expect we want to preserve the ability to annotate or not annotate, and either show annotation with standard error (se) or "Z score", depending on a given setting.

I guess the intention here is to combine the previous boolean option (to annotate or not annotate with standard error) with the option of annotating with either standard error or Z score. I suppose if any option is specified at all, then "to annotate" is implied, otherwise the default (of None) is to not annotate. Is that the idea?

The logic here looks a little odd, by the way:

            if annotate_se:
                if annotate_se is True or annotate_se == "standard error":
                    se = fst_df.iloc[index_key]["se"]
                    fig_df[col][index] = se
                elif annotate_se == "Z score":
                    zs = fst_df.iloc[index_key]["fst"] / fst_df.iloc[index_key]["se"]
                    fig_df[col][index] = zs

...because the condition annotate_se is already True in that block.

Another thing that is catching my attention is the variable name annotate_se. Since the way the option works will be different, and standard error (se) is no longer the only option, I'm wondering if we should rename the variable. I suspect that keeping it the same name is designed to preserve some backward compatibility with the previous usage, but I wonder if that might cause confusion, given that the default is now None instead of False, and it's no longer a boolean. Any thoughts?

jonbrenas commented 3 months ago

Another thing that is catching my attention is the variable name annotate_se. Since the way the option works will be different, and standard error (se) is no longer the only option, I'm wondering if we should rename the variable. I suspect that keeping it the same name is designed to preserve some backward compatibility with the previous usage, but I wonder if that might cause confusion, given that the default is now None instead of False, and it's no longer a boolean. Any thoughts?

Yes, it is for backward compatibility. annotate_se is actually a union of a boolean or one of the two strings "standard error" or "Z score".

...because the condition annotate_se is already True in that block.

Technically, no. annotate_se is not None and it is not False at this point but it can either be True or one of the two strings. If it is True (and not just defined) or "standard error", we annotate with se.

leehart commented 3 months ago

Ah, I see what you mean! Thanks @jonbrenas . Evidently this is potentially confusing though. 🤔

leehart commented 3 months ago

@jonbrenas Are we able to drop backwards compatibility in favour of clarity in this case, or is there a strong case for supporting the existing option?

If I understand correctly, rather than keeping an option named annotate_se but with new behaviour, i.e.

We could rename the option to something like annotation with functionally-equivalent behaviour but clearer semantics, i.e.

But obviously this would be a breaking change to the API.

jonbrenas commented 3 months ago

I'll let @alimanfoo confirm but I think it had been decided that backward compatibility was more important than clarity (at least, in this case).

leehart commented 3 months ago

I suppose another option might be to keep the existing annotate_se option as it is, but somehow visibly deprecate it so that we can phase it out in a later release, and also add a new annotation option for standard error and standard score (z-score), i.e. with the new behaviour but with a more appropriate name.

However, it occurs to me that if we're trying to use semantic versioning (https://semver.org/) then the deprecation of an old argument, or any API incompatibility, should already be implicit. If we increase the major version number, then we are already saying that there are changes to the API that are not backwards compatible. Perhaps this isn't always strictly followed, but I think it's a general principle we've been trying to adhere to.

That's why it seems there needs to be a strong case (which there might be) to keep onto things just for backwards compatibility, because it can stifle improvements and cause unnecessary clutter and baggage. One of the obvious costs is that existing code (e.g. in notebooks) will need to be updated accordingly, but only if that code also updates the major version of API that it's using, in theory.

jonbrenas commented 3 months ago

I guess there are several thing to take into consideration:

  1. Do we plan on doing a major release soon? If yes, it makes sense to rename the variable; if not, we can leave it as it is.
  2. The current code has a bug. The lower-left part of the heat map is not filled so I would argue that a quick but imperfect fix is better than a perfect but delayed one.
  3. This is not what the diagrams are going to look like for very long. @alimanfoo thinks (and I agree) that reading in Ls is hard and that,beecause Fst, se and Z have wildly different ranges, the colorbar can't accommodate two at the same time, so the plan is to find another way (see #542).
leehart commented 3 months ago

Thanks @jonbrenas I think that whether we do a major or minor release will depend on what changes to the API have occurred. Like I say, if there are changes that make the API incompatible with the previous version then that will probably determine a major release, rather than deciding ahead of time (before we have made changes) if we are making a major release or not.

The next release that I'm aware of will be related to providing terms-of-use data, which wouldn't break the existing API, but I don't expect that to happen this week.

The way I see it, it's just a version number, but obviously we don't want to make backwards incompatible changes if they are unnecessary and significant costs.

When you say quick fix, is there a particular urgency to getting this bug fixed? I suppose I'm arguing for more perfect code at the cost of time, but I don't know all the risks.

jonbrenas commented 3 months ago

When you say quick fix, is there a particular urgency to getting this bug fixed? I suppose I'm arguing for more perfect code at the cost of time, but I don't know all the risks.

Well, I am using Fst plots in some of the papers that I am writing (but I can "cheat" and use the corrected code in my own notebooks instead of the API) and I know that Rita is also working with Fsts on the Ugandan data currently. She was at the inception for the changes. She has not complained too loudly yet but it might be that she is too polite ;) .

leehart commented 3 months ago

Thanks @jonbrenas In principle, would you approve a different PR that replaced the annotate_se with a new annotation option as described above (i.e. accepts {None (default), "standard error", "Z score"}), otherwise identical to this PR, which would be available in the next release? Or is there something I'm missing? I'm happy to make that PR if you don't have time, just to resolve this.

jonbrenas commented 3 months ago

I am not sure why we can't do it here. It basically amounts to renaming the parameter and removing the option for a boolean.

leehart commented 3 months ago

@jonbrenas Sure, there's no reason why we couldn't do it here, apart from the commit history. I just wasn't sure if you wanted to, or if you were happy for me to do it, or if there was some reason for not doing it that I was missing.

review-notebook-app[bot] commented 3 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

leehart commented 3 months ago

Thanks @jonbrenas . I'll merge this in tomorrow, unless objections. I can also then make a new release soon after. 🚀