neurogenetics / GWAS-pipeline

This is a general (somewhat comprehensive) description of the LNG GWAS pipeline which can be used to guide researchers on how-to run a GWAS.
41 stars 23 forks source link

relatedness #3

Open montenegrina opened 4 years ago

montenegrina commented 4 years ago

Hello,

is there is a way to do this whole step with removing related individuals in Plink and how it would look like?

gcta --bfile $FILENAME --make-grm --out GRM_matrix --autosome --maf 0.05 gcta --grm-cutoff 0.125 --grm GRM_matrix --out GRM_matrix_0125 --make-grm plink --bfile $FILENAME --keep GRM_matrix_0125.grm.id --make-bed --out $FILENAME_relatedness

Thanks Ana

CornelisB commented 4 years ago

Yes you can just not do this step and rename your files and continue.

montenegrina commented 4 years ago

yes but I would like to QC for relatedness so I was wondering how to do the whole step in Plink because I don't have GCTA.

CornelisB commented 4 years ago

Ah OK, in that case you can first prune your data and then do --genome in PLINK and then filter out one random selected sample from each pair that has PIHAT >0.125

montenegrina commented 4 years ago

Can you please tell me if that is what you mean? And how I would extract at random selected samples with PI_HAT>0.125

plink --bfile outputZ --indep-pairwise 100 25 0.2 plink --bfile outputZ --extract plink.prune.in --make-bed --out outputZ1 plink --bfile outputZ1 --genome --max 0.125 --make-bed --out outputZ2

a=read.table("outputZ2.genome", header=T)

head(a) FID1 IID1 FID2 IID2 RT EZ Z0 Z1 Z2 PI_HAT PHE DST PPC 1 fam0110 G110 fam0113 G113 UN NA 0.9733 0 0.0267 0.0267 -1 0.807353 0.3533 2 fam0110 G110 fam0114 G114 UN NA 1.0000 0 0.0000 0.0000 -1 0.807687 0.1310 3 fam0110 G110 fam0114 G115 UN NA 1.0000 0 0.0000 0.0000 -1 0.808148 0.0327 4 fam0110 G110 fam0114 G116 UN NA 0.9787 0 0.0213 0.0213 -1 0.806944 0.1706 5 fam0110 G110 fam0117 G117 UN NA 1.0000 0 0.0000 0.0000 -1 0.808925 0.1715 6 fam0110 G110 fam0118 G118 UN NA 0.9958 0 0.0042 0.0042 -1 0.804596 0.7876 RATIO 1 1.9736 2 1.9226 3 1.8749 4 1.9344 5 1.9348 6 2.0573

On Mon, Jun 22, 2020 at 1:37 PM CornelisB notifications@github.com wrote:

Ah OK, in that case you can first prune your data and then do --genome in PLINK and then filter out one random selected sample from each pair that has PIHAT >0.125

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurogenetics/GWAS-pipeline/issues/3#issuecomment-647702338, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACF3RTCWXB7XJC3VCLO73CDRX6QF3ANCNFSM4OCGYVAQ .

montenegrina commented 4 years ago

is there is any difference in doing the above and this? plink2 --bfile outputZ10 --king-cutoff 0.088 plink2 --bfile outputZ10 --remove plink2.king.cutoff.out.id --make-bed --out outputZ11 as I understand this would remove the first and 2nd degree relatives but I don't understand how is this related to Pi_hat number. Can you please give me an explanation on what is relationship between --king-cutoff 0.088 and Pi_hat?

On Mon, Jun 22, 2020 at 2:03 PM Ana Marija sokovic.anamarija@gmail.com wrote:

Can you please tell me if that is what you mean? And how I would extract at random selected samples with PI_HAT>0.125

plink --bfile outputZ --indep-pairwise 100 25 0.2 plink --bfile outputZ --extract plink.prune.in --make-bed --out outputZ1 plink --bfile outputZ1 --genome --max 0.125 --make-bed --out outputZ2

a=read.table("outputZ2.genome", header=T)

head(a) FID1 IID1 FID2 IID2 RT EZ Z0 Z1 Z2 PI_HAT PHE DST PPC 1 fam0110 G110 fam0113 G113 UN NA 0.9733 0 0.0267 0.0267 -1 0.807353 0.3533 2 fam0110 G110 fam0114 G114 UN NA 1.0000 0 0.0000 0.0000 -1 0.807687 0.1310 3 fam0110 G110 fam0114 G115 UN NA 1.0000 0 0.0000 0.0000 -1 0.808148 0.0327 4 fam0110 G110 fam0114 G116 UN NA 0.9787 0 0.0213 0.0213 -1 0.806944 0.1706 5 fam0110 G110 fam0117 G117 UN NA 1.0000 0 0.0000 0.0000 -1 0.808925 0.1715 6 fam0110 G110 fam0118 G118 UN NA 0.9958 0 0.0042 0.0042 -1 0.804596 0.7876 RATIO 1 1.9736 2 1.9226 3 1.8749 4 1.9344 5 1.9348 6 2.0573

On Mon, Jun 22, 2020 at 1:37 PM CornelisB notifications@github.com wrote:

Ah OK, in that case you can first prune your data and then do --genome in PLINK and then filter out one random selected sample from each pair that has PIHAT >0.125

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurogenetics/GWAS-pipeline/issues/3#issuecomment-647702338, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACF3RTCWXB7XJC3VCLO73CDRX6QF3ANCNFSM4OCGYVAQ .

CornelisB commented 4 years ago

I would update to this: plink --bfile outputZ1 --genome --min 0.125 --make-bed --out outputZ2 Its better to use min than max PIHAT column can be interpret like 0.5 = first degree, 0.25 second degree etc.

CornelisB commented 4 years ago

Using --king-cutoff should be more or less equivalent to gcta or the pihat approach

montenegrina commented 4 years ago

Thanks! Also one more thing I don't see here any QC steps after imputation. Do you just recommend removing SNPs with low imputation score or?

CornelisB commented 4 years ago

You can follow something like this => https://github.com/neurogenetics/GWAS-pipeline#regions-file

montenegrina commented 4 years ago

Thanks! are these parameters:

MAF >= 0.001 & Rsq >= 0.30

usually recommended for data imputed on Minimac4?

I set Rsq >= 0.30 as setting for my imputation parameters, do I need to set it again on imputed data?

Also I was thinking to do these steps on my imputed data, can you please let me know what you think?

plink --vcf chr1.dose.vcf.gz --biallelic-only --make-bed --double-id --out s1 plink --bfile s1 --bmerge s1 --merge-mode 6 plink --bfile s1 --exclude plink.missnp --make-bed --out s2 plink --bfile s2 --list-duplicate-vars plink --bfile s2 --exclude plink.dupvar --make-bed --out s3 plink --bfile s3 --qual-scores chr1.info 7 1 1 --qual-threshold 0.8 --make-bed --out s4 plink --bfile s4 --maf 0.01 --hwe 1e-7 --snps-only --make-bed --out s5 plink --bfile s5 --geno 0.1 --mind 0.05 --make-bed --out FINAL_DATA_QC1

Thanks

Ana

On Mon, Jul 6, 2020 at 11:13 AM CornelisB notifications@github.com wrote:

You can follow something like this => https://github.com/neurogenetics/GWAS-pipeline#regions-file

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurogenetics/GWAS-pipeline/issues/3#issuecomment-654331604, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACF3RTGHVJ354SMETZHW6L3R2HZ25ANCNFSM4OCGYVAQ .

CornelisB commented 4 years ago

Ah if already pre-filtered then you should be good to go... dont think you need to filter again for duplicated because variant names should be now in this format CHR:BP:REF:ALT. If you want you can filter for R2 0.8, but would recommend doing GWAS on dosages rather than PLINK rounded calls though

montenegrina commented 4 years ago

Well I did QC my data prior imputation. (all steps you are mentioning on your page) but my question is POST imputation do I need to do any QC steps (is there is any evidence for that) aside setting R2 0.8? You are mentioning on your page: MAF >= 0.001 & Rsq >= 0.30 why MAF of this values?

On Mon, Jul 6, 2020 at 11:33 AM CornelisB notifications@github.com wrote:

Ah if already pre-filtered then you should be good to go... dont think you need to filter again for duplicated because variant names should be now in this format CHR:BP:REF:ALT. If you want you can filter for R2 0.8, but would recommend doing GWAS on dosages rather than PLINK rounded calls though

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

CornelisB commented 4 years ago

MAF is usually good prior to GWAS, because depending on your sample size, you likely wont have power for anything under MAF 1-5% so its just slower to include also all variants with lower MAFs

montenegrina commented 4 years ago

Thanks for that. So in conclusion if I did a very detailed QC prior imputation the only step I need to do after the imputation is to remove SNPs with imputation scores less than say 0.8?

On Tue, Jul 7, 2020 at 5:28 PM CornelisB notifications@github.com wrote:

MAF is usually good prior to GWAS, because depending on your sample size, you likely wont have power for anything under MAF 1-5% so its just slower to include also all variants with lower MAFs

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurogenetics/GWAS-pipeline/issues/3#issuecomment-655169226, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACF3RTAW4UYTCWDOBHRLUGTR2OOPPANCNFSM4OCGYVAQ .

CornelisB commented 4 years ago

You could throw another HWE just to be on the save side. If you are looking for a more automated workflow you can check this here => https://github.com/GP2code/GWAS

montenegrina commented 4 years ago

Thanks! I am not looking so much for automated workflow but more for general recommendations on how to proceed with GWAS, like what are the latest "trends"

On Thu, Jul 9, 2020 at 9:35 AM CornelisB notifications@github.com wrote:

You could throw another HWE just to be on the save side. If you are looking for a more automated workflow you can check this here => https://github.com/GP2code/GWAS

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurogenetics/GWAS-pipeline/issues/3#issuecomment-656165486, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACF3RTEQKX362APBJLAWG3TR2XIUXANCNFSM4OCGYVAQ .