statgen / METAL

Meta-analysis of genomewide association scans
Other
42 stars 12 forks source link

Is there a file size or SNP amount limit when using METAL? #21

Open dzimmerman-amc opened 2 years ago

dzimmerman-amc commented 2 years ago

Hi there,

I am trying to meta-analyse two summary statistic files together which differ greatly in size (3.2gb, 600mb). The large file contains ~58M SNPs while the small file contains ~7.8M SNPS. METAL is able to process the smaller file, however when I try processing the larger files I get the following error:

## ERROR: The command you issued could not be processed ...

I have triple checked that the headers are set correctly but it still does not seem to work. Are there limits to to the amount of SNPs metal can process perhaps?

Thanks in advance!

welchr commented 2 years ago

That error looks to only occur in two places in the program:

https://github.com/statgen/METAL/blob/e2253cc3901df8403a331bd725d4d9fe1edfb19f/metal/Main.cpp#L1807-L1809 https://github.com/statgen/METAL/blob/e2253cc3901df8403a331bd725d4d9fe1edfb19f/metal/Main.cpp#L2158-L2160

Seems to always have something to do with the number of separate tokens it can find on a line in the metal script. It goes through the commands that should only have 1 token (like CLEARFILTERS or ANALYZE), and if it still finds your line only contains 1 token, it returns that error. Same thing for all commands that should have 2 tokens (LOGPVALUE ON). So it could be you've either got a bad command, or maybe there's a strange delimiter causing the actual parsing of the number of tokens to be incorrect. Completely guessing though.

What does your script look like?

dzimmerman-amc commented 2 years ago

Thanks for the reply! My script looks as below:

metal

scheme STDERR marker SNPID allele REF ALT effect log(OR) pvalue P stderrlabel LOG(OR)_SE WEIGHTLABEL OBS_CT process SCALifeLinesMETALnoSNPFinal.txt

marker SNP allele A2 A1 effect log(OR) pvalue P stderrlabel SE WEIGHTLABEL N

process AGNES_2738_N_with_header process GEVAMI_1350_N_with_header process PREDESTINATION_1256_N_with_header

marker SNPID allele ALLELE0 ALLELE1 effect BETA pvalue pvalue stderrlabel SE WEIGHTLABEL N

proccess UKBB_SCA_EUR_GWAS__VF_STRICT__sumstats_14032022_MAC3_METALfinal.tsv

analyze heterogeneity

It is only the final file that this error occurs on, just to give you an idea of what the file looks like I will show you a few lines:

GENPOS CHROM SNPID ID ALLELE0 ALLELE1 N BETA SE pvalue 828 17 17:828:C:T rs62053745 C T 457346 0.0916884 0.0621816 0.140340170558253 834 17 17:834:A:G 17:834_G_A A G 457346 0.126658 0.0720196 0.0786339333382236 1389 17 17:1389:A:G rs62053747 A G 457346 0.0500072 0.100815 0.61987339955502 1665 17 17:1665:C:T rs34151105 C T 457346 0.0574858 0.0798451 0.47154592920749 1880 17 17:1880:T:C rs77383171 T C 457346 -0.0343875 0.0774773 0.657158303831746

I still can't seem to resolve the issue who see anything wrong with my script. I could, however, be missing something incredibly obvious! Thanks for helping out!

welchr commented 2 years ago

If you change proccess -> process for the final file, does it work?