nickjcroucher / gubbins

Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
http://nickjcroucher.github.io/gubbins/
GNU General Public License v2.0
171 stars 50 forks source link

REF and Reference do not match, is this expected? #411

Open djbradshaw2 opened 4 months ago

djbradshaw2 commented 4 months ago

Dear Gubbins Creators,

Thanks for such a great tool! I wanted to check on an observation that I made looking through the *.summary_of_snp_distribution.vcf file. There seems to be a difference in the number of SNPs between the REF column and the Reference column that have matching nucleotides (only 151,919/153,743 SNPs match between the columns). Is this an expected result?

This is despite the *.per_branch_statistics.csv stating that the Reference had no SNPs... gubbins.per_branch_statistics.csv

Please let me know if you'd like me to email the vcf file, it is 1.12 GB. Please let know if you need any other information or have any other questions.

Thanks for your time and help.

Sincerely,

David

Gubbins Version: 3.3.0

Scripts: snippy-clean_full_aln core.full.aln > SX519_Chromosomal_Ref_clean.core.full.aln

run_gubbins.py --first-tree-builder rapidnj --first-model JC -p gubbins -c 32 -v SX519_Chromosomal_Ref_clean.core.full.aln

nickjcroucher commented 4 months ago

Do you have a specific example I could take a look at? Thanks!

djbradshaw2 commented 4 months ago

Hi Dr. Croucher,

Thanks again for helping me!. Please see attached the full SNPs for my dataset with just the CHROM, POS, ID, REF, ALT, Reference, example isolate (short read version of the reference), and columns to sort to see which SNPs do not match between the REF and Reference columns, and the REF and example isolate. Please let me know if you would like any other information.

Thanks for your time and help,

Sincerely,

David

gubbins_all_SNPs_Refs.txt

djbradshaw2 commented 3 days ago

Hi Dr. Croucher,

Sorry, I was curious if you have had a chance to determine if this was an expected result or not (having REF and Reference not matching in the vcf file)? Please let me know if you need any additional information. I have run into the same issue with a subsetted version of these isolates.

Thank you again for your time and help.

Sincerely,

David