Closed pgcudahy closed 8 months ago
You can change the percentage missing data used by Gubbins to filter the alignment - worth removing any low quality ingroup sequences before relaxing that criterion though.
Thanks for your reply. How would you recommend removing low quality ingroup sequences?
Use can use the gubbins_alignment_checker.py
script that is included in the package (https://github.com/nickjcroucher/gubbins/blob/master/python/scripts/gubbins_alignment_checker.py). It will identify isolates with high missing base counts, but won't filter them at present.
Thanks, I tried the alignment checker and it shows the same info as the error messages which is that all of my samples have gone from < 10% to now > 40% Ns. Is it still valid to run gubbins with a 50% filter percentage? I was a bit confused by the responses to issues 275 and 359 and whether they apply now that ska masks repeats.
You can try increasing the k-mer size to improve the mapping to smaller repeated sequences. Do you expect this much of the genome to be repeated sequence? If you think there is a problem with repeat detection in this case, you can raise an issue at https://github.com/bacpop/ska.rust.
Hello, thanks for the great tool. I had run an analysis a while ago and now a peer reviewer has asked me to alter it. In the meantime my university's compute cluster changed so I had to rebuild my pipeline. With this, I have seen a large change in gubbins' output that I would like to understand. I have a collection of 217 M. kansasii samples along with the FDAARGOS_1615 type strain. Under gubbins v3.2.1 generate_ska_alignment.py produces an alignment that starts with
With gubbins v3.3.3 and the same list of samples it gives
I believe this change is because the flag
--repeat-mask
was added toska map
. However, now gubbins will fail withExcluded sequence FDAARGOS_1615 because it had 43.94050504976698 percentage missing data while a maximum of 25.0 is allowed
and the same error for all of the samples in my collection. Do you have any advice on how to move forward?Thanks, Patrick