Open djbradshaw2 opened 5 months ago
Do you have a specific example I could take a look at? Thanks!
Hi Dr. Croucher,
Thanks again for helping me!. Please see attached the full SNPs for my dataset with just the CHROM, POS, ID, REF, ALT, Reference, example isolate (short read version of the reference), and columns to sort to see which SNPs do not match between the REF and Reference columns, and the REF and example isolate. Please let me know if you would like any other information.
Thanks for your time and help,
Sincerely,
David
Hi Dr. Croucher,
Sorry, I was curious if you have had a chance to determine if this was an expected result or not (having REF and Reference not matching in the vcf file)? Please let me know if you need any additional information. I have run into the same issue with a subsetted version of these isolates.
Thank you again for your time and help.
Sincerely,
David
Hi David,
When gubbins creates a vcf, it uses the first sample listed in the aln file as the REF column. Snippy, however, likes to to put what you feed it as Reference last.
Try putting the Reference sequence first, something along the lines of
snippy-clean_full_aln core.full.aln > snippy-clean.aln
sequence_count=$(grep -c "^>" snippy-clean.aln)
awk '/^>/{n++} {if (n == '$sequence_count') print}' snippy-clean.aln > SX519_Chromosomal_Ref_clean.core.full.aln
awk '/^>/{n++; if (n == '$sequence_count') exit} {if (n < '$sequence_count') print}' snippy-clean.aln >> SX519_Chromosomal_Ref_clean.core.full.aln
unset sequence_count
run_gubbins.py --first-tree-builder rapidnj --first-model JC -p gubbins -c 32 -v SX519_Chromosomal_Ref_clean.core.full.aln
Dear Gubbins Creators,
Thanks for such a great tool! I wanted to check on an observation that I made looking through the *.summary_of_snp_distribution.vcf file. There seems to be a difference in the number of SNPs between the REF column and the Reference column that have matching nucleotides (only 151,919/153,743 SNPs match between the columns). Is this an expected result?
This is despite the *.per_branch_statistics.csv stating that the Reference had no SNPs... gubbins.per_branch_statistics.csv
Please let me know if you'd like me to email the vcf file, it is 1.12 GB. Please let know if you need any other information or have any other questions.
Thanks for your time and help.
Sincerely,
David
Gubbins Version: 3.3.0
Scripts: snippy-clean_full_aln core.full.aln > SX519_Chromosomal_Ref_clean.core.full.aln
run_gubbins.py --first-tree-builder rapidnj --first-model JC -p gubbins -c 32 -v SX519_Chromosomal_Ref_clean.core.full.aln