Core features of pheweb: ref/alt, add_rsid, hg17/38

jielab commented 1 year ago

Hi, guys:

I really like pheweb. After simply running a few short commands such as pip3 install pheweb and pheweb phenolist glob --star-is-phenocode, then everything works magically on my own laptop, and I could explore those cool tables and figures even when I fly in the sky.

However, I do have a couple of feature-requests/wonderings. It would be super cool if you guys could agree that addressing some of these might be useful to the broad pheweb community.

The Github documentation says that _It needs a column for the reference allele (which must always match the bases on the reference genome that you specified with hg_build_number) and a column for the alternate allele_. I dont' know why this is a must, since nobody is using pheweb to run GWAS meta-analysis or two-sample MR kind of analysis where alignment of alleles are needed! As we know, these days GWAS downloaded from everywhere usually have their own ways of specifying effect/non-effect (or reference/alternative, or A1/A2) alleles. If we indeed must align the alleles in GWAS files to the reference genome, how to do it correctly and effeciently without going through some complicated GATK procedure? The documentation says that If you have a MARKER_ID column like 1:234_C/G, that's okay too. Once I have such a MARKER_ID column in my GWAS files, I still need to split them into sepaate columns of chr, pos, ref, alt, because those are required columns, correct?
I really like the fact that pheweb does NOT require rsid for input GWAS. Instead, it can generate new GWAS files with rsid appended, stored at generated-by-pheweb/pheno_gz/. I am wondering how to run this add_rsid module as a standalone script/command? The log shows that https://resources.pheweb.org/rsids-v154-hg38.tsv.gz is downloaded into my computer when I run pheweb. Is there a way to prevent this file from getting downloaded again and again each time when I run? Or can I specify the path of this file in config.py instead of having it at the default location?
Let's say that I have 100 GWAS that I would like to process and display in pheweb. The positions in some of them are based on hg18 while others are based on hg38. Is there a way to specify the hg_build_number = option twice, one for those with hg18 position and one for those with hg38 position? If no, if I had to liftOver all GWAS to the same hg_build first, is there an easy way to do it? I know that I could use liftOver tool. But these days each GWAS file is usually over 10 million rows...

That's all I got.

Thank you very much & best regards, Jie

pjvandehaar commented 1 year ago

I don't remember how MARKER_ID works. You'll have to look at the code I guess. Somewhere in the issues on this repo I suggested how to swap a1/a2 to get ref/alt using detect_ref.py. I recommend starting there.
Yeah, I recommend adding conf.symlink_to_cache_dir to make a symlink at https://github.com/statgen/pheweb/blob/76f0d0e32ae72e51bc4b259ce4b16edfd653601a/pheweb/load/download_rsids.py#L25 instead of copying the file. Maybe send a PR?
No, one dataset = one build.

jielab commented 1 year ago

Thanks! Can you please let me know where I could find detect_ref.py ?

pjvandehaar commented 1 year ago

pheweb https://github.com/statgen/pheweb/tree/master/pheweb/load https://github.com/statgen/pheweb/tree/master/pheweb/load/detect_ref.py Or run detect-ref at command line.

On Thu, Jun 8, 2023 at 5:47 AM Jie Huang @.***> wrote:

Thanks! Can you please let me know where I could find detect_ref.py ?

— Reply to this email directly, view it on GitHub https://github.com/statgen/pheweb/issues/208#issuecomment-1582239406, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGSPCOTY74OFXK47X2CR2DXKGNRTANCNFSM6AAAAAAY5D2OSU . You are receiving this because you commented.Message ID: @.***>

statgen / pheweb

Core features of pheweb: ref/alt, add_rsid, hg17/38 #208