metaGmetapop / metapop

A pipeline for the macro- and micro-diversity analyses and visualization of metagenomic-derived populations
MIT License
37 stars 10 forks source link

Help: difference between local and global SNP calls #17

Open hdore opened 1 year ago

hdore commented 1 year ago

Hi @metaGmetapop, Thank you for developing MetaPop!

I have a 2 questions regarding MetaPop output for microdiversity:

1) the run_settings.tsv file for my MetaPop run indicates SNP Scale local. However, I still get microdiversity files labeled as local AND global (e.g. global_raw_microdiversity_data_snp_loci_only.tsv AND local_raw_microdiversity_data_snp_loci_only.tsv). Is that the expected behavior of MetaPop? If yes, what does the --snp_scale actually change?

2) I do not understand the difference between local and global SNP calls. The MetaPop paper says

For local SNP calls, the set of true positions identified in the global calls are reduced to the set of SNV positions identified in each BAM individually. SNV sites only observed in other BAM files are ignored.

The way I understand these sentences is that in local SNV calls, a SNV is called in a sample only if it makes 1% of the base pair coverage for that sample with more than 4 reads for the variant. Is that correct?

I looked for samples in my data that were present in the 'global' output file but not in the 'local' file for a given SNV position. I found one that is present in sample A in the 'global' output with a coverage of 402, 187 As and 215 Ts. This sample A is absent from the 'local' file (but other samples are present for the same SNV position, and they have much lower coverage). Thus I do not understand why my sample A is absent from the 'local' output, as clearly the variant passes the thresholds in that sample.

I must be missing something, thank you for helping me understand the difference between local and global,

Best, hdore

hdore commented 1 year ago

@metaGmetapop do you have any input on this? Thank you!

hdore