jinghuazhao / FM-pipeline

A Fine-Mapping pipeline
https://jinghuazhao.github.io/FM-pipeline/
18 stars 10 forks source link

Issues with input data #2

Open montenegrina opened 3 years ago

montenegrina commented 3 years ago

Hello,

I have result from META analysis done with METAL and my summary stats looks like this:

SNP BP CHR A1 A2 Weight Z P Direction rs11693919 127673527 2 C T 19586 -5.565 2.617e-08 --- rs10203679 127672359 2 C T 19586 -5.536 3.089e-08 --- ...

So I don't have BETA and SE. Can you please tell me how and is it possible to derive those from Z and P values?

Also I have a question about column N (samples size) needed for your sumstats.

Since I am doing this for meta analysis results from 3 cohorts, the question is this the number of all cases+ controls or just all samples in 3 cohorts (some subjects are neither cases, nor controls). These numbers are quite different for me:

19550 number of cases and controls 339622 number of all samples

I am planning to run this only for CAVIARBF, LOCUSZOOM and FINEMAP.

I installed both of them and set path:

CAVIARBF

export PATH=/projects/com_grassim/anamaria/anamaria/anamaria/METAL/generic-metal/Wenan-caviarbf-25bac331a203:$PATH

FINEMAP

export PATH=/projects/com_grassim/anamaria/anamaria/anamaria/install_finemap/finemap_v1.1_x86_64:$PATH

LOCUSZOOM

export PATH=/projects/com_grassim/anamaria/anamaria/anamaria/locus_zoom/locuszoom/bin:$PATH

I also have: module load tools/qctool-2.0.7

and Plink 1.9 is on my PATH

So aside setting this in fmp.ini

export CAVIAR=0 export CAVIARBF=1 export clumping=0 export FM_summary=0 export GCTA=0 export JAM=0 export LocusZoom=1 export fgwas=0 export finemap=1

what else do I need to set up there?

Cheers, Ana

jinghuazhao commented 3 years ago

I think z and P are aliased, i.e., difficulty to tell apart b/se. To see this we started from a nomal statistic z=-1.96, a two-sided P=0.05, we can only obtain z=b/se and in R

qnorm(0.05/2) [1] -1.959964 2* pnorm(-1.96) [1] 0.04999579

Since you used METAL it might be better to specify SCHEME STDERR and in the future it will facilitate effect-size based meta-analysis.

Also you mentioned finemap, N would be the sample size from meta-analysis rather than individual cohorts/reference panel.

I should say I will be rather more comfortable if you can take mine as template for your own implementation.

montenegrina commented 3 years ago

Hello,

thank you so much for getting back to me.

About the sample size N for METAL

for each cohort (I have 3) I have specified their number of samples (cases+ controls)

DEFAULTWEIGHT 1211 DEFAULTWEIGHT 1363 DEFAULTWEIGHT 17012

Does this DEFAULTWEIGHT has to be actually the sum of those 3 numbers or?

I my results file I have these columns:

MarkerName Allele1 Allele2 Weight Zscore P-value Direction rs2326918 a g 19586.00 0.254 0.7991 --+

Is Weight (19,586) actually the sample size N for FM-pipeline?

Also I am not sure if FM-pipeline can detect say my installation of CaviarBF if just set it up on the PATH export PATH=/projects/com_grassim/anamaria/anamaria/anamaria/METAL/generic-metal/Wenan-caviarbf-25bac331a203:$PATH

or do I also have to to insert this somewhere in fmp.ini in order for it to work?

Thank you so much for your help, Ana

On Fri, Dec 11, 2020 at 9:50 AM jinghuazhao notifications@github.com wrote:

I think z and P are aliased, i.e., difficulty to tell apart b/se. To see this we started from a nomal statistic z=-1.96, a two-sided P=0.05, we can only obtain z=b/se and in R

qnorm(0.05/2) [1] -1.959964 2* pnorm(-1.96) [1] 0.04999579

Since you used METAL it might be better to specify SCHEME STDERR and in the future it will facilitate effect-size based meta-analysis.

Also you mentioned finemap, N would be the sample size from meta-analysis rather than individual cohorts/reference panel.

I should say I will be rather more comfortable if you can take mine as template for your own implementation.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jinghuazhao/FM-pipeline/issues/2#issuecomment-743272237, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACF3RTEGCGWQZINCCIA73YTSUI5T3ANCNFSM4UVZA4PQ .

montenegrina commented 3 years ago

Also I did run again meta analysis with SCHEME STDERR enabled but then my results differ (looking at P value):

with SCHEME STDERR: MarkerName Allele1 Allele2 Effect StdErr P-value Direction rs11693919 t c -0.3396 0.0652 1.894e-07

without SCHEME STDERR:

SNP A1 A2 Weight Z P Direction rs11693919 t c 19586 -5.565 2.617e-08 ---

What is the best software to do meta analysis to be used right away with your software?

What about METASOFT?

But there I am not sure which columns exactly from the output file shown here: http://genetics.cs.ucla.edu/meta/ I should choose for BETA and SE to be used with your software?

jinghuazhao commented 3 years ago

It looks like you need to let METAL know about your sample sizes. I have my example here,

my cohort level output is

SNPID CHR POS STRAND N EFFECT_ALLELE REFERENCE_ALLELE CODE_ALL_FQ BETA SE PVAL RSQ RSQ_IMP IMP chr1:10177_A_AC 1 10177 NA 4896 AC A 0.374898 0.0450289 0.0245472 0.0666582 NA 0.730453 NA ...

My METAL commands,

SEPARATOR TAB COLUMNCOUNTING STRICT CHROMOSOMELABEL CHR POSITIONLABEL POS CUSTOMVARIABLE N LABEL N as N TRACKPOSITIONS ON AVERAGEFREQ ON MINMAXFREQ ON ADDFILTER CODE_ALL_FQ >= 0.001 ADDFILTER CODE_ALL_FQ <= 0.999 MARKERLABEL SNPID ALLELELABELS EFFECT_ALLELE REFERENCE_ALLELE EFFECTLABEL BETA PVALUELABEL PVAL WEIGHTLABEL N FREQLABEL CODE_ALL_FQ STDERRLABEL SE SCHEME STDERR GENOMICCONTROL OFF LOGPVALUE ON OUTFILE /home/jinhua/INF/METAL/IL.20- .tbl PROCESS /data/jinhua/INF/sumstats/INTERVAL/INTERVAL.IL.20.gz PROCESS /data/jinhua/INF/sumstats/BioFinder/BioFinder.IL.20.gz PROCESS /data/jinhua/INF/sumstats/EGCUT/EGCUT.IL.20.gz ... ANALYZE HETEROGENEITY CLEAR

montenegrina commented 3 years ago

Thank you so much for sharing this.

Does every variant in your GWAS result have N=4896? If not how did you run that regression?

I used Plink:

plink2 --bfile Merge --pheno pheno_covar_EDIC_fin.txt --pheno-name PHENO --glm omit-ref cols=+beta,+a1freq,+machr2 hide-covar --covar pheno_covar_EDIC_fin.txt --covar-name C1,C2,REN_INSF,mean_HBA1C,DDuration --covar-quantile-normalize --covar-variance-standardize --out edic_fin1

where sum stats look like this:

head edic_fin1.PHENO.glm.logistic.hybrid

CHROM POS ID REF ALT A1 A1_FREQ MACH_R2 FIRTH? TEST OBS_CT BETA SE Z_STAT

P ERRCODE 1 752566 rs3094315 A G G 0.178053 0.894244 N ADD 1171 -0.129154 0.188324 -0.685807 0.492835 . 1 768448 rs12562034 G A A 0.105892 1.00784 N ADD 1171 0.426252 0.196836 2.16552 0.0303478 . ...

So I have this OBS_CT is that equal to your N? If not how do I get it?

Also I thought you let METAL know about sample sizes (where I understand that as =cases+controls, is it?) using these flags in metal script: DEFAULTWEIGHT 1211 DEFAULTWEIGHT 1363 DEFAULTWEIGHT 17012

Where I Shave a constant number for sample size in each cohort

On Fri, Dec 11, 2020 at 3:07 PM jinghuazhao notifications@github.com wrote:

It looks like you need to let METAL know about your sample sizes. I have my example here,

my cohort level output is

SNPID CHR POS STRAND N EFFECT_ALLELE REFERENCE_ALLELE CODE_ALL_FQ BETA SE PVAL RSQ RSQ_IMP IMP chr1:10177_A_AC 1 10177 NA 4896 AC A 0.374898 0.0450289 0.0245472 0.0666582 NA 0.730453 NA ...

My METAL commands,

SEPARATOR TAB COLUMNCOUNTING STRICT CHROMOSOMELABEL CHR POSITIONLABEL POS CUSTOMVARIABLE N LABEL N as N TRACKPOSITIONS ON AVERAGEFREQ ON MINMAXFREQ ON ADDFILTER CODE_ALL_FQ >= 0.001 ADDFILTER CODE_ALL_FQ <= 0.999 MARKERLABEL SNPID ALLELELABELS EFFECT_ALLELE REFERENCE_ALLELE EFFECTLABEL BETA PVALUELABEL PVAL WEIGHTLABEL N FREQLABEL CODE_ALL_FQ STDERRLABEL SE SCHEME STDERR GENOMICCONTROL OFF LOGPVALUE ON OUTFILE /home/jinhua/INF/METAL/IL.20- .tbl PROCESS /data/jinhua/INF/sumstats/INTERVAL/INTERVAL.IL.20.gz PROCESS /data/jinhua/INF/sumstats/BioFinder/BioFinder.IL.20.gz PROCESS /data/jinhua/INF/sumstats/EGCUT/EGCUT.IL.20.gz ... ANALYZE HETEROGENEITY CLEAR

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jinghuazhao/FM-pipeline/issues/2#issuecomment-743427196, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACF3RTE52FAEPLKUUNHSZXDSUKCXTANCNFSM4UVZA4PQ .

jinghuazhao commented 3 years ago

An approximate answer would be your study sample size; however if you stick with variable version you might need to run PLINK separately and merge with regression results. I had attempted to comprise with this in our consortium analysis, see https://github.com/jinghuazhao/INF/blob/master/doc/SCALLOP_INF1_analysis_plan.md, however we used SNPTEST for our own data which would have N. Another note is that finemap appears to use beta/se (in its z file) rather than z/p.

jinghuazhao commented 3 years ago

plink --freq would give those N?