tskit-dev / what-is-an-arg-paper

Manuscript and code for the "What is an ARG?" paper
1 stars 8 forks source link

Questionable unsuitability of standard math formalism #439

Closed castedo closed 8 months ago

castedo commented 8 months ago

This is just one point of many, including positive points (see #438).

Feel free to close this if you think there are no actions to be take given your audience and objectives.

I find the way "math formalisms" are described here:

https://github.com/tskit-dev/what-is-an-arg-paper/blob/55986e675fdd2db3bf1687e0ac75e107c67b318b/paper.tex#L77-L88

as either too ambiguous or unconvincing.

It almost sounds like you are arguing that the gARG math formalism should be used instead of the standard math formalisms in general and not just in the context of encoding modern datasets. If this is not what you are arguing, then I think the language is too ambiguous. Perhaps the ambiguity is due to not making clear you are talking about math formalisms to facilitate understanding of encodings with modern datasets. Or if you are making this argument, then I find it unconvincing because I do not think the main purpose of math formalism it to describe knowable data encoded in data structures.

To give some context to why I found this language around "math formalisms" and "outputs" confusing consider the output of a program that calculates powers of e. The conceptual output of this program is a real (irrational) number. But the output is probably going to be encoded as a 64-bit IEEE 754 floating-point number. Or maybe as some fancy rational number approximation. Two perfectly good math formalisms for real numbers are Dedekind cuts, or an infinite series, or converging sequences of rational numbers. Are these math formalisms the output of the program? Or is the output an IEEE 754 floating-point encoding? Are the infinite math formalisms unsuited for the realities of modern computers? Should we use math formalisms of real numbers that are like IEEE 754 encodings? I believe the answer is it depends on the purpose of the math formalism. And sometimes the encoding of the data is very different from a totally legit math formalism for the conceptual output.

So the text "the underlying assumption that all events can be known and precisely estimated" seems odd to me in that why does using a standard math formalism for ARG carry that assumption? Is it because you're assuming that the math formalism is being used as a data encoding? But encoding data is not the only use for a math formalism. In fact, I would argue that math formalism are most useful for mentally accounting for latent data which is unknowable. So I find it odd to claim there is some issue with a math formalisms because it exhaustively details data which is unknowable. Agreed if the formalism is for encoded data, but not true for math formalisms in general.

jeromekelleher commented 8 months ago

This is a totally valid perspective @castedo, and you're right in many ways. What we're really talking about is an ARG encoding, a way to concretely describe an actual thing, be it derived from simulations or real data. I would also like to think that this encoding is useful mathematically, as it provides clarity on certain points that are obscure with the Griffiths encoding. See the section on the equivalence of the set of local trees to an ARG - how could people have been wrong about such an obvious and basic thing? They weren't using the right tools for thinking about ARGs - the encoding, or (more vaguely) mathematical formalism.

So, basically we're using "mathematical formalism" in a slightly different way to you. Most of our readers aren't mathematicians or formally trained in maths. The word "encoding" doesn't get across the right ideas to them, so we settled on "mathematical formalism" to try and get this across. If we were aiming the paper at a more mathematical audience it would use quite different language (and also be quite different!).

Does this seem reasonable, or am I missing the point?

hyanwong commented 8 months ago

Is there an alternative phrasing we could use for "mathematical formalism" that would address @castedo 's concern but would still convey the meaning we want? Could we say "standard formal descriptions of an ARG" instead, for example?

castedo commented 8 months ago

I think it's reasonable to use the term "mathematical formalism" and fine to leave an ambiguity that makes me wonder what you're getting at but probably won't make most of your readers wonder.

If someone asked me, what's the difference between a gARG and a succinct tree sequence I'd probably say things like :

I don't think it's really the use of "mathematical formalism" that made me wonder what you're getting at. It's the combination of

It left me wondering whether you're arguing that in general people should use gARG as a math formalism and not math formalisms that include exhaustive details. My reaction was perhaps a bit like if someone was saying the Turing machine math formalism is bad and we should use an alternative math formalism that is more like how modern microprocessors work.

jeromekelleher commented 8 months ago

Interesting take, that hadn't occurred to me...