Open ravwojdyla opened 3 years ago
Just noting that I started using infer_
for the ploidy methods because count_
didn't sound correct. I don't think infer_
is great either, but it's the best verb I could come up with for extracting something that's implicit from the call_genotype
array.
It's also worth pointing out that in many cases in which the method doesn't have a prefix, the calculated variable has stat_
as a prefix. E.g. Tajimas_D
returns variable stat_Tajimas_D
.
I can see a few possibilities for a general rule that we could apply to everything:
f"compute_{variable_name}"
(or some other prefix)There's some questions then about whether we should strip off the "group" prefix on the variable names for the compute function (like Tajima's D, as @timothymillar points out).
In general, I think we should try to get a point where we can determine the name of a method to compute a variable from the variable name. It would be nice to be able to formalise the graph of variable dependencies, and this would be a good step in that direction. Do you think this makes sense @tomwhite, and is something worthwhile to aim for?
In general, I think we should try to get a point where we can determine the name of a method to compute a variable from the variable name.
One thing worth mentioning is that there is not a 1:1 mapping between method and output variable, since some methods create multiple output variables. E.g. Garud_H
creates stat_Garud_h1
, stat_Garud_h12
, stat_Garud_h123
, stat_Garud_h2_h1
variables. (Also some variables are created by multiple methods - I'm thinking of the IO functions here.) So this makes determining the method name a bit harder.
It would be nice to be able to formalise the graph of variable dependencies, and this would be a good step in that direction.
I agree that would be nice.
I wrote a quick hack to inspect the call frame for methods to find their input and output variables, then ran the unit tests to generate reasonable call coverage. It's not complete, but gives an idea of the relationships between input variables, methods, and output variables. Notice that some methods can produce (or consume) different sets of variables depending on how they are called.
I'm not sure what the next step is though!
Method name -> input variables:
Method name -> output variables:
Some discussion in https://github.com/pystatgen/sgkit/pull/647#discussion_r686015457
As pointed out by @jeromekelleher it's interesting to look at the main init, to see current names. Some current options:
count_
prefixinfer_
prefixSome of them make sense in different context. In this issue we want to discuss the convention and document it.