Closed trvrb closed 5 years ago
--min-size
is done!
The use of --min-size
is causing an issue with Snakemake if cluster0.fasta
is deleted. I think I'd modify extract_cluster_fastas.py
to renumber cluster FASTAs to run contiguously from cluster1.fasta
, cluster2.fasta
, etc...
Do you mean to renumber them after the small clusters are dropped due to --min-size
?
@cassiawag: Changes in 1ca1ea21bf765e7fb7560175209c5ec817a749fc have obviated the need for this renumbering. We're no longer relying on the existence of cluster0.fasta
.
@miparedes: I made some changes to extract_cluster_fastas.py
in the above. You may want to move your changes to be on top of the current master
branch.
This is resolved via #4 and #12.
@miparedes @cassiawag ---
Could you add two command line options to
extract_cluster_fastas.py
? These are:--min-size
: Take a parameter for the minimum number of genomes to include in an output cluster. I expect this will usually be set to--min-size 2
. A lot of the downstream machinery will break if there's just a single sample (ie I expectaugur tree
to break if handed a FASTA alignment with just a single element).--filter-to-seattle
: It should make life easier downstream if we only export cluster FASTAs that contain "seattle" viruses. You'll need to identify clusters that contain viruses withregion: seattle
by importing the metadata tsv.I'd suggest doing these as two separate feature branches / PRs.