kkdey / GSSG

Gene Set + S2G strategy annotations analyzed for disease architecture
45 stars 12 forks source link

Is this process suitable for analyzing samples of East Asians? #18

Closed bitcometz closed 1 year ago

bitcometz commented 2 years ago

hello, @kkdey I have sequenced some healthy and disease samples by scRNA-Seq sequencing. Do I need to replace some files in the process, such as 1000G_EUR_Phase3_plink, hapmap3_snps, 1000G_Phase3_weights_hm3_no_MHC, and all_sumstats files ?

It seems a very complicated thing for me.

Can I just use all the European inputs and just change my gene sets?

Besides, I found1000G_Phase3_EAS_plinkfiles.tgz and 1000G_Phase3_EAS_weights_hm3_no_MHC.tgz in this URL: https://alkesgroup.broadinstitute.org/LDSCORE/

And wher can I find the hapmap3_snps for EAS?

Thanks !!!

kkdey commented 2 years ago

You can use European inputs with your gene sets - that way you will be able to identify cell type programs whose genetic architecture does not change drastically between EAS and EUR, which probably would be the case for most gene sets. If you want to do a fully EAS specific analysis, you can definitely use the EAS plink and weight files you referred to instead.

The sumstats are indeed going to be largely EUR specific and that may be the biggest constraint here, because EAS specific sumstats I have seen people use are very few and not so diverse as EUR,

For hapmap_3 , I don't think you need EAS specific HAPMAP 3 SNPs, the hapmap 3 that is on the portal is across all populations, and combined with EAS specific plink files, will automatically restrict it to the relevant SNPs.

bitcometz commented 2 years ago

Dear @kkdey , thanks your reply .

Besides, I do not know where to find the similar files for EAS: *1000G_Phase3_frq/1000G.EUR.QC..frq**

Could you help with this problem ?

Thanks !!!