petrelharp / ftprime_ms

4 stars 2 forks source link

Slight terminology change: Succinct tree sequence #54

Closed jeromekelleher closed 6 years ago

jeromekelleher commented 6 years ago

Would anyone object if I changed the terminology slightly from 'tree sequence' to 'succinct tree sequence'? It's a bit vague as just a "tree sequence", which doesn't capture the fact that we're efficiently encoding a sequence of trees rather than it just being some sequence of trees.

I like succinct, because it has a technical meaning that is very close to what we actually do (from Wikipedia):

In computer science, a succinct data structure is a data structure which uses an amount of space that is "close" to the information-theoretic lower bound, but (unlike other compressed representations) still allows for efficient query operations.

I don't think it's a big change, as we'll just say "A succinct tree sequence (or for brevity, a tree sequence) is ...".

ashander commented 6 years ago

Cool by me. Also nice to know that technical term. It's the first I'd heard of it but it's a good concept

petrelharp commented 6 years ago

also fine by me. although: most people will not recognize "succinct" as a technical term.

On Wed, Jan 10, 2018 at 12:50 PM, Jaime Ashander notifications@github.com wrote:

Cool by me. Also nice to know that technical term. It's the first I'd heard of it but it's a good concept

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/petrelharp/ftprime_ms/issues/54#issuecomment-356732527, or mute the thread https://github.com/notifications/unsubscribe-auth/AA_26cJWSmzHWoZ_rSLcPyO7NfN2UjHqks5tJSKigaJpZM4RZhPR .

molpopgen commented 6 years ago

I'm fine with it. It may be handy to give the compsci definition in ,, upon first use.

petrelharp commented 6 years ago

I'm doing it in the abstract, too...

petrelharp commented 6 years ago

Er, to be clear: what is a "tree sequence"? Right now we don't really define it, other than to use it to mean "the sequence of trees", and talk about "the tables" as encoding one. Do you mean to define "succinct tree sequence" to be the thing with nodes and edges (and sites, mutations)? That we store in the tables, but one could store in other ways? That's what I'm going with...

petrelharp commented 6 years ago

In #60 I tried moving from "tree sequence format" to "succinct tree sequence". I think we can mostly remove the word "format", referring instead to "tree sequences" or "tables" when that's what we really mean. I'll give it a try.

jeromekelleher commented 6 years ago

Er, to be clear: what is a "tree sequence"? Right now we don't really define it, other than to use it to mean "the sequence of trees", and talk about "the tables" as encoding one.

How about: "A succinct tree sequence is a tuple (N, E, S, M), where N is a set of nodes (as defined above),..." etc. The tables encode the nodes, edges etc.

Do you mean to define "succinct tree sequence" to be the thing with nodes and edges (and sites, mutations)? That we store in the tables, but one could store in other ways? That's what I'm going with...

Yes, exactly.

Also good call on getting a definition of succinct in on first usage, this is an excellent idea.

petrelharp commented 6 years ago

The way I've got it now is not quite so precise. If you are proposing to define "succinct tree seq" to be the four tables, then that would let us call the tables something other than "the tables", which would be good. I'd still want to refer to the tables, though, e.g. "we add a row to the edge table of the succinct tree sequence" rather than "we add a row to the succinct tree sequence".

However, current usage separates somewhat the idea from the particular representation in tables, which is conceptually nice.