statgen / pheweb

A tool to build a website to browse hundreds or thousands of GWAS.
MIT License
158 stars 65 forks source link

Make parser more comprehensive #93

Closed pjvandehaar closed 3 years ago

pjvandehaar commented 7 years ago

Parsers to look at:

It'd be great to make this a stand-alone tool, parse-assoc --num-samples=100 --chr=CONTIG --pos=BP ... <assoc_file>

Steps:

  1. Figure out columns, but consider ref/alt to just be two allele columns.
  2. If there are the two allele columns, check them against hg18, hg19, hg38. If one is consistently ref, make it ref. If neither is consistent, then what? make ref, alt, risk_allele? That'll make PheWAS a pain, it'd be nicer to just invert OR/beta right away to be ref-relative. If it's on a build we don't like, liftover to whatever the standard is. Watch out for negative strand SNPs and indels!
    • if there's an ambiguous allele, options:
      • drop the allele
      • drop the effect size (but keep pval, &c)
        • tag it as "strand-ambiguous" (how? where does this data go? "notes"/"warning" column?)
  3. If (rsid, chr, pos, ref, alt) all exist, just drop rsid and redo it? If rsid exists but something else is missing, use dbSNP to remake whatever's missing and check that it matches? But with which build? This seems like a pain...
  4. sort by (chrom, pos, ref, alt)
  5. parse other fields, checking their types &c. maybe auto-compute MAF (from AC+num_samples, genoct, AF) and beta (from OR)?
pjvandehaar commented 7 years ago

(see https://github.com/statgen/pheweb/issues/77)