matsengrp / cft

Clonal family tree
5 stars 3 forks source link

Use new partis fcn add_seqs_to_line() #311

Open psathyrella opened 4 years ago

psathyrella commented 4 years ago

It's hard to believe when we talked about this before I didn't realize how easy it would be, but I wrote a function to add new sequences (i.e. inferred intermediates from a phylo program) to an existing partis annotation without running partis, it just uses a mafft step if there's some constant region cruft. I'm pretty sure there's at least one place in cft that we want to use this rather than running partis from scratch?

https://github.com/matsengrp/cft/blob/master/SConstruct#L1136

https://github.com/psathyrella/partis/blob/dev/python/utils.py#L512

eharkins commented 4 years ago

Nice! Yes definitely preferable there, I'll double check that's the only place. Then maybe a good first issue for @jgallowa07 (for real this time)!

psathyrella commented 4 years ago

Great!

I opened basically the same issue in linearham, it seems like it'd probably make sense to do both of 'em at the same time (if i'm right that it's also applicable there).

eharkins commented 4 years ago

@psathyrella just want to confirm my understanding before @jgallowa07 starts on this:

We can replace the above linked step in the CFT pipeline which

with an existing partis script or a script that we write which does those same things using partis functions directly thanks to your recent changes?

psathyrella commented 4 years ago

I would still run the selection metrics with a partis command, use the get-selection-metrics action. The benefit of this new function is that cft isn't creating a new annotation from scratch that could differ from the existing one (which I think has caused problems in the past), and which is quite time consuming for large families. So in cft you want to read the existing partis annotation, call the new function to add the new sequences from the phylo inference to the existing annotation, write the new annotation to disk, then run partis get-selection-metrics on that new annotation file.

eharkins commented 4 years ago

Makes sense. Thanks!

psathyrella commented 3 years ago

I added a new command line interface if that makes this easier

https://github.com/psathyrella/partis/blob/dev/bin/add-seqs-to-outputs.py