tskit-dev / pyslim

Tools for dealing with tree sequences coming to and from SLiM.
MIT License
27 stars 23 forks source link

max_slim_mutation_id function #290

Closed mufernando closed 1 year ago

mufernando commented 2 years ago

Creates a function that returns the maximum SLiM mutation ID. Useful for adding more mutations with msprime.sim_mutations().

Solves #179

petrelharp commented 2 years ago

Looks good; we should also have a test that this actually lets us mutate a SLiM-produced tree sequence more and load it back in to SLiM?

mufernando commented 2 years ago

should be good to go! thanks @petrelharp

petrelharp commented 2 years ago

Hm - thinking a bit more, I think this should be next_slim_mutation_id( ), which returns the max plus one. That would mean that (a) we can say "just pass this value in" instead of "this value plus one"; and (b) it reutrns 0 in the case of no mutations, instead of -1.

petrelharp commented 2 years ago

Ah - and, we need to clarify (and, test) what this does for tree sequences whose derived states are not SLiM-produced. In the case of binary alleles (0/1), it will return 1 (or, 2 if we change to "next"); this is unavoidable but we should say in the docstring what it actually does so that people understand. If alleles are nucleotides, it'll do

>>> int("A")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'A'

so maybe we should try.. catch ValueError and give an informative error message?

mufernando commented 2 years ago

I changed it to give us the next mutation ID and introduced a try...except block to catch problems with different types of derived state.

petrelharp commented 2 years ago

I like it. A few suggested changes, and we need a test for the error.

mufernando commented 2 years ago

this should be good to go now!

petrelharp commented 2 years ago

Whoops - testing in more situations has turned up an issue: a derived or ancestral state of "" is valid (it means "no mutations"), and this produces an error. I think that right now the main implementation of this function would wrongly throw an error in this case, in fact.

petrelharp commented 2 years ago

Huh, and actually, this method should also iterate over sites.ancestral_state.

mufernando commented 2 years ago

@petrelharp and I talked about the ancestral state column in the Sites Table today and we don't see how that could be used here? SLiM doesn't use that column yet. So this should be ready to merge.