Open hyanwong opened 1 year ago
It's there, you need the alignments though, otherwise you're matching the imputed sequence.
Ah, that's great news. I guess we just need some documentation then. Shall I change the title of the issue to reflect that?
A related comment, pretty much what Jerome is saying. I was thinking specifically about this task to find a set of equally likely matching candidates
. If we do not match against the nodes in an ARG rather than the original sequences which may contain Ns and/or are masked post-alignment, then the candidates with equal likelihood values may not be truly equally best matching. It would be useful to quickly check against the original aligned sequences, but I suppose that gets costly.
Is this a matter of feeding samples
into match
https://github.com/jeromekelleher/sc2ts/blob/1443e0fa85125f7018c30eb503f003d340032b13/sc2ts/inference.py#L277?
These other bits are much more complicated @szhan - I think that's a separate issue.
Shing's right, you need to feed the samples you're interested in to match
There's some preconditions though:
samples
are Sample instances created by pulling metadata out of the DBYou could wrap this all up easily enough if you just had a single alignment in a FASTA - this was built to make doing things incrementally quick with a Large dataset. Working with plain FASTAs was too slow, likewise pulling metadata out of a CSV
Is there any simple sc2ts function which will take one of the output ARGs (say the long ARG) and match a specific sample into it again, at a specific time (i.e. ignoring younger samples)? This might help to look at the effect of changing the HMM params, or perhaps to find a set of equally likely matching candidates, or even to look at the posterior distribution around a breakpoint.
It would be good to give external users a simple way of doing this. @szhan was asking about it too.