Closed pjvandehaar closed 3 years ago
(see https://github.com/statgen/pheweb/issues/77)
[ ] Make pheweb parse-one -c parse-config.json <assoc-files...>
parse-config.json
specifies everything needed. eg,
{
format: {type:'csv', delimiter: '\t'}
header: {ignore_leading: '#'},
ignore_lines: {starting_with: '#'},
build: 'hg19',
check_against_reference: true,
conversions: [
{in: 'CHR', convert: 'chrom', out: 'chrom'},
{in: 'BP', convert: 'int', out: 'pos'},
{in: 'MARKER_ID', regex: '^([0-9XYMT]+)_([1-9][0-9]+)_([ACGT]+)_([ACGT]+)', out: [{out:'chrom', convert:'chrom'}, {out:'pos', convert:'int'}, {out:'ref'}, {out:'alt'}]},
{in: 'P.VALUE', convert: 'float', sigfigs: 2, out: 'pval', skip_if: ['NA', '.', '']},
{in: 'freq', convert: 'float', sigfigs: 2, out: 'af'},
...
],
constraints: {
pos: {ge: 0},
pval: {ge: 0, le: 0},
af: {gt: 0, lt: 0},
or: {gt: 0}
}
}
ref
vs alt
aren't known, use a1
and a2
.
{strand: '+'}
, which allows converting from [a1
, a2
] -> [ref
, alt
]. And maybe it has to recipricalize odds ratio and negate beta. Rather than being assumed, we should encode what happens a1
of a2
or neither or we-don't-know-which match ref
.{convert_rsid_to: ['chrom', 'pos', 'ref', 'alt']}
(allow any subset) which downloads dbSNP. (see https://github.com/statgen/pheweb/issues/66)out
is produced multiple times, assert that they agree.{reader: 'pandas'}
?{sort_variants: true}
, which will force sorting. if false or missing, sort-order will still be checked, and unsorted input will throw PheWebUnsortedAssocFile
. (see https://github.com/statgen/pheweb/issues/71)[ ] make tests for pheweb parse-one
[ ] make pheweb guess-format-one -f fields-config.json <assoc-files...>
, which produces a parse-config.json
.
fields-config.json
is somewhat like conf.parse
currently is.num_samples
.parse-config.json
allowed comments. maybe switch to json5
.[ ] once pheweb guess-format-one
works well, use it to make an interactive (either through the terminal, a text file, or a web browser) pheweb quickstart
.
categories.xlsx
-style files.
Parsers to look at:
It'd be great to make this a stand-alone tool,
parse-assoc --num-samples=100 --chr=CONTIG --pos=BP ... <assoc_file>
Steps:
ref
,alt
,risk_allele
? That'll make PheWAS a pain, it'd be nicer to just invert OR/beta right away to be ref-relative. If it's on a build we don't like, liftover to whatever the standard is. Watch out for negative strand SNPs and indels!