matsengrp / cft

Clonal family tree
5 stars 3 forks source link

Add process_partis.py option for a specific indel #281

Closed eharkins closed 5 years ago

eharkins commented 5 years ago

@lauradoepker would like the ability to run (ecgtheow) on only the subset of sequences in a particular cluster that have a given indel ( I am opening this issue on cft because the way ecgtheow processes partis output is by using cft/bin/process_partis.py).

This option would come with other options to specify the indel of interest, including:

The name is up for debate; something like : --only-with-particular-indel, --unique-indel, --indel-filter, etc. Going to call it --only-with-particular-indel for now:

Assuming this makes sense to everyone (cc @matsen), I will open separate issues:

psathyrella commented 5 years ago

Yeah, except I think I've changed my mind about how to specify the indel parameters. I think maybe this is what laura was suggesting and I was just being dense, but I think it's probably better to just say "match the indels in this sequence", i.e. specify a uid, rather than having to specify the length/pos/type of the indel.

lauradoepker commented 5 years ago

@eharkins I'd like the filtered seqs outfile to be named a little more explicitly, something like indel_filtered_cluster_seqs.fa. Since all sequences in EC will be indel_rev, I'm okay with this fact not being reflected in the file name, but if we do add it (to both), it may prevent future forgetfulness on my part about indel reversal.

metasoarous commented 5 years ago

A few things here:

eharkins commented 5 years ago

Thanks for the input here. It seems like we are going to spend a little bit more time on thinking about how best to handle the particular indel-ed family Laura is currently dealing with, then we can generalize a solution like this if appropriate. @matsen, @lauradoepker let me know how I can be of help in determining the best way forward with that family.

lauradoepker commented 5 years ago

@eharkins it's completely up to you to decide how generalized you write the code at this point. I want 157.Vk settled as soon as possible, but not at the cost of you having to rewrite all your code later to make it more generalizable. This issue, then, is for you and @matsen to decide.

eharkins commented 5 years ago

https://github.com/matsengrp/cft/commit/e66cf19c3f80d7b986cdcffe29bf0c40ce1d0ca2