make find_split_time more robust

mufernando / graft

0 stars 0 forks source link

make find_split_time more robust #2

Closed petrelharp closed 4 years ago

petrelharp commented 4 years ago

Right now it assumes that slim_provenances and provenances are the same length. Instead, should probably parse the provenance directly instead of using slim_provenances (see pyslim code for slim_provenances; maybe using json.reads( )).

Also:

use the last SLiM provenance before the two chains differ (it might not be the last non-differing one)

mufernando commented 4 years ago

I think it starts to get tricky if all provenances are not slim provenances. msprime goes back in time, so it does not make sense to consider them for the split time.

mufernando commented 4 years ago

we could make that assumption (all prov are slim prov) but checking this is true to make it more robust.

petrelharp commented 4 years ago

I think you just need to do like:

last_slim_prov = None
for j, (p1, p2) in enuemrate(zip(ts1.provenances, ts2.provenances)):
   if p1 != p2:
        break
  if (p1 is a slim prov):
        last_slim_prov = psylim.however_you_parse_a_prov(p1)
if last_slim_prov is None:
  raise ValueError("No shared SLiM provenance entries.")

mufernando commented 4 years ago

Done. See here.

petrelharp commented 4 years ago

Looks good!