tseemann / snp-dists

Pairwise SNP distance matrix from a FASTA sequence alignment
GNU General Public License v3.0
126 stars 28 forks source link

Exclude certain FASTA sequences in pairwise assessment #46

Open matt-sd-watson opened 3 years ago

matt-sd-watson commented 3 years ago

For multi-FASTA files it may be useful to be able to exclude certain sequences by FASTA header ID when performing the pairwise SNP comparison. For example, excluding the reference sequence when processing COVID-19 sequences and comparisons to the reference are not needed. The input argument could accept either a .txt file of line-separated IDs or a bash array.

tseemann commented 3 years ago

Until this feature exists, this could work:

seqkit grep -v -f ids_to_ignore.txt < input.afa | snp-dists /dev/stdin > out.tsv