nickjcroucher / gubbins

Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
http://nickjcroucher.github.io/gubbins/
GNU General Public License v2.0
175 stars 51 forks source link

Ignore excluding sequence which has missing data #359

Closed snackens closed 1 year ago

snackens commented 1 year ago

Hi, I'd like to get whole genome alignment after removing recombinant regions. But while gubbins, one strain is excluded because of a lot of missing data as following.

Excluded sequence XXX because it had 27.37969901914598 percentage missing data while a maximum of 25.0 is allowed

Are there any options which allow to ignore the above? Or Do you recommend to change the reference sequence in Snippy? Thanks,

nickjcroucher commented 1 year ago

--filter-percentage FILTER_PERCENTAGE, -f FILTER_PERCENTAGE Filter out taxa with more than this percentage of gaps (default: 25.0)

But probably a good idea to exclude this sequence unless it's an outgroup or very important.