voichek / kmersGWAS

A library for running k-mers based GWAS
GNU General Public License v3.0
100 stars 24 forks source link

Selecting the best read from Bowtie2 alignment for the K-mer of interest #87

Closed DuttaAnik closed 3 years ago

DuttaAnik commented 3 years ago

Hi Yoav, I hope you had a good Easter break. I have a question regarding choosing the K-mer location based on read alignments from Bowtie2 as you did in the paper. For example, I have 10 K-mers for which I extracted the reads from 20 individuals. And then I aligned those reads accordingly and performed different filtering to remove the non-mapped reads and kept only the reads with the highest mapping quality.

But when I view the alignment in IGV, I see that different reads from different individuals with the highest mapping quality are aligned in different chromosomes.

How do you then choose the location of the K-mer or decide which genes the significant K-mers are tagging when different individuals are showing different alignments?

Cheers Anik

voichek commented 3 years ago

Dear Anik,

Actually in the phenotypes I have looked at so far I didn't have this scenario, but indeed different genotype-phenotype associations can take different forms. I don't think this cases will have a one solution fits all. Maybe local assembly of reads containing a single k-mer from one accession at a time can give you a larger genetic fragment that will help you solve this.

BTW, did you plot the qq-plot for all the k-mers? did it look good for your phenotype of interest?

Best, Yoav

DuttaAnik commented 3 years ago

Hi Yoav, Ok, I understand. I guess in most of the cases, I will be satisfied with the mappable K-mers. For the majority of the unmapped K-mers, the reads aligned with the same genes that were tagged by the mapped K-mers. Interestingly, the significant K-mers actually tagged some of the known genes that we expected to detect with an SNP-based GWAS. I guess this is one great example of the superiority of K-mer GWAS. I wonder why would that happen.

However, the Q-Q plot for one trait looks like Figure 1b in your paper. It starts to deviate around 3 on y-axis and follow the same trend. I guess that is not too inflated, is it? I will try to upload a figure here. I could not do it as it is pretty big. Thanks for checking on this.

voichek commented 3 years ago

First, I am happy that it gave you reasonable and interesting results! I guess if the k-mer tagged a specific SNP you can go back and check why it was not detected by the SNP GWAS... Maybe some filtering step in the SNP calling?

Deviation at around 3 (or even 2) is what you would expect with no inflation, as it means only 1/1000 variants have p-value more surprising than expected.

Best, Yoav

DuttaAnik commented 3 years ago

Thanks a lot, Yoav for the information. I guess the genes are too polymorphic that SNP-based GWAS could not pick them up whereas K-mers could successfully do that. Let's see what happens next. Have a good day.

Cheers Anik