kitchensjn / tskit_arg_visualizer

Interactive visualization method for ancestral recombination graphs
MIT License
11 stars 3 forks source link

Meaning of mutation label unclear? #103

Open hyanwong opened 4 days ago

hyanwong commented 4 days ago

I see mutation labels now default to e.g. C123T:9, and it took me some time to figure out that the "9" was the time of the mutation (rather than e.g. the genome position). This also doesn't play well with unknown mutation times, which are a common feature in tree sequences (time=tskit.UNKNOWN_TIME, which results in "ValueError: cannot convert float NaN to integer"). I see that you also round the time to an int, which might not always be appropriate.

Given the length of the labels, and the issues with NaN etc, I suspect that the best thing might be to stick with the simple INHERITED_STATEpositionDERIVED_STATE scheme, but provide example code for how to modify the labels to include time, position, or whatever? If you need to write the time units, they should be present in the ts.time_units attribute, although writing C123T @9generations is probably a bit long-winded for a label!

kitchensjn commented 4 days ago

Good point! I've reverted back to the original string format.

One thing that I've changed is I've separated the "label" text from the "content" of the popup. This allows for things like the condense mutation labelling scheme (x2, x4, etc). Both of these are generated on draw(), which limits the current customizability. We could store it in the mutations data frame (similarly to the labels for nodes in the nodes data frame). Users could then do customize this to their liking, but this strategy may not play nicely with condense_mutations=True if the user has added a lot of text to the mutation "content" and then all of this text gets merged together into a single popup. I think I'll save this added customizability to mutations for the next release (v0.0.4).

hyanwong commented 4 days ago

Nice idea about separating short labels from longer content. I agree that the popup can be much longer, and that might be useful (maybe appending the time_units isn't so bad in the popup). Perhaps you could pass a function that maps a row of the mutations data frame to a content string?