Closed jonbrenas closed 3 months ago
Thanks @jonbrenas . Shall I try to resolve the conflicts on this branch?
Sorry, @leehart . I had not seen that fst.py had been modified in the meanwhile. I will solve the conflict.
Hi @jonbrenas I just took a quick look at the code, and I just wanted to clarify how the option should work, if that's OK?
I expect we want to preserve the ability to annotate or not annotate, and either show annotation with standard error (se) or "Z score", depending on a given setting.
I guess the intention here is to combine the previous boolean option (to annotate or not annotate with standard error) with the option of annotating with either standard error or Z score. I suppose if any option is specified at all, then "to annotate" is implied, otherwise the default (of None) is to not annotate. Is that the idea?
The logic here looks a little odd, by the way:
if annotate_se:
if annotate_se is True or annotate_se == "standard error":
se = fst_df.iloc[index_key]["se"]
fig_df[col][index] = se
elif annotate_se == "Z score":
zs = fst_df.iloc[index_key]["fst"] / fst_df.iloc[index_key]["se"]
fig_df[col][index] = zs
...because the condition annotate_se
is already True
in that block.
Another thing that is catching my attention is the variable name annotate_se
. Since the way the option works will be different, and standard error (se) is no longer the only option, I'm wondering if we should rename the variable. I suspect that keeping it the same name is designed to preserve some backward compatibility with the previous usage, but I wonder if that might cause confusion, given that the default is now None
instead of False
, and it's no longer a boolean. Any thoughts?
Another thing that is catching my attention is the variable name annotate_se. Since the way the option works will be different, and standard error (se) is no longer the only option, I'm wondering if we should rename the variable. I suspect that keeping it the same name is designed to preserve some backward compatibility with the previous usage, but I wonder if that might cause confusion, given that the default is now None instead of False, and it's no longer a boolean. Any thoughts?
Yes, it is for backward compatibility. annotate_se is actually a union of a boolean or one of the two strings "standard error" or "Z score".
...because the condition annotate_se is already True in that block.
Technically, no. annotate_se is not None and it is not False at this point but it can either be True or one of the two strings. If it is True (and not just defined) or "standard error", we annotate with se.
Ah, I see what you mean! Thanks @jonbrenas . Evidently this is potentially confusing though. 🤔
@jonbrenas Are we able to drop backwards compatibility in favour of clarity in this case, or is there a strong case for supporting the existing option?
If I understand correctly, rather than keeping an option named annotate_se
but with new behaviour, i.e.
We could rename the option to something like annotation
with functionally-equivalent behaviour but clearer semantics, i.e.
But obviously this would be a breaking change to the API.
I'll let @alimanfoo confirm but I think it had been decided that backward compatibility was more important than clarity (at least, in this case).
I suppose another option might be to keep the existing annotate_se
option as it is, but somehow visibly deprecate it so that we can phase it out in a later release, and also add a new annotation option for standard error and standard score (z-score), i.e. with the new behaviour but with a more appropriate name.
However, it occurs to me that if we're trying to use semantic versioning (https://semver.org/) then the deprecation of an old argument, or any API incompatibility, should already be implicit. If we increase the major version number, then we are already saying that there are changes to the API that are not backwards compatible. Perhaps this isn't always strictly followed, but I think it's a general principle we've been trying to adhere to.
That's why it seems there needs to be a strong case (which there might be) to keep onto things just for backwards compatibility, because it can stifle improvements and cause unnecessary clutter and baggage. One of the obvious costs is that existing code (e.g. in notebooks) will need to be updated accordingly, but only if that code also updates the major version of API that it's using, in theory.
I guess there are several thing to take into consideration:
Thanks @jonbrenas I think that whether we do a major or minor release will depend on what changes to the API have occurred. Like I say, if there are changes that make the API incompatible with the previous version then that will probably determine a major release, rather than deciding ahead of time (before we have made changes) if we are making a major release or not.
The next release that I'm aware of will be related to providing terms-of-use data, which wouldn't break the existing API, but I don't expect that to happen this week.
The way I see it, it's just a version number, but obviously we don't want to make backwards incompatible changes if they are unnecessary and significant costs.
When you say quick fix, is there a particular urgency to getting this bug fixed? I suppose I'm arguing for more perfect code at the cost of time, but I don't know all the risks.
When you say quick fix, is there a particular urgency to getting this bug fixed? I suppose I'm arguing for more perfect code at the cost of time, but I don't know all the risks.
Well, I am using Fst plots in some of the papers that I am writing (but I can "cheat" and use the corrected code in my own notebooks instead of the API) and I know that Rita is also working with Fsts on the Ugandan data currently. She was at the inception for the changes. She has not complained too loudly yet but it might be that she is too polite ;) .
Thanks @jonbrenas
In principle, would you approve a different PR that replaced the annotate_se
with a new annotation
option as described above (i.e. accepts {None (default), "standard error", "Z score"}), otherwise identical to this PR, which would be available in the next release? Or is there something I'm missing? I'm happy to make that PR if you don't have time, just to resolve this.
I am not sure why we can't do it here. It basically amounts to renaming the parameter and removing the option for a boolean.
@jonbrenas Sure, there's no reason why we couldn't do it here, apart from the commit history. I just wasn't sure if you wanted to, or if you were happy for me to do it, or if there was some reason for not doing it that I was missing.
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Thanks @jonbrenas . I'll merge this in tomorrow, unless objections. I can also then make a new release soon after. 🚀
Resolves #536 .