statgen / pheweb

A tool to build a website to browse hundreds or thousands of GWAS.
MIT License
154 stars 65 forks source link

outdated instruction "pheweb phenolist glob-files" #217

Open jielab opened 4 months ago

jielab commented 4 months ago

Hi, there:

I noticed pheweb phenolist glob-files on this instruction page https://github.com/statgen/pheweb/blob/master/etc/detailed-loading-instructions.md#configuration-options. However, it won't work. It should be just "glob", not "glob-files".

The first time when I run pheweb process, I think pheweb automatically download the huge dbsnp reference files. If I already have these huge dbsnp files downloaded for another software, how could I speficy its location in the config.py file so that it won't need to be re-downloaded?

Previously, I have pointed out that many GWAS simply report A1/A2, not the strict REF/ALT. Once the reference genomes get bigger and bigger, the REF/ALT specification would become really indefinite and inconsistent. Therefore, I strongly suggest that pheweb would accommodate the use of A1/A2 and be able to map rsIDs with a better flexibility. Otherwise, most SNPs in a GWAS would have no mapped rsIDs after running pheweb. You guys only need to change one line (line 140), for add_rsids.py. Beforersids = [rsid['rsid'] for rsid in rsid_group if cpra['ref'] == rsid['ref'] and are_match(cpra['alt'], rsid['alt'])] Afterrsids = [rsid['rsid'] for rsid in rsid_group if (cpra['ref'] == rsid['ref'] and are_match(cpra['alt'], rsid['alt'])) or (cpra['ref'] == rsid['alt'] and are_match(cpra['alt'], rsid['ref']))]

Please kindly consider this.

Thank you & best regards, Jie

pjvandehaar commented 4 months ago

Thanks for pointing out “glob” vs “glob-files”! I just fixed it.

For rsids, I think you’re right. Anyways, it’s a harmless change, right? If you make a PR with that change I’ll merge it. Or did you already make one? I can’t remember anymore. Actually, I think it would be even better to just match on chr:pos and ignore ref/alt.

On Wed, Feb 14, 2024 at 9:34 PM Jie Huang @.***> wrote:

Hi, there:

I noticed pheweb phenolist glob-files on this instruction page https://github.com/statgen/pheweb/blob/master/etc/detailed-loading-instructions.md#configuration-options. However, it won't work. It should be just "glob", not "glob-files".

The first time when I run pheweb process, I think pheweb automatically download the huge dbsnp reference files. If I already have these huge dbsnp files downloaded for another software, how could I speficy its location in the config.py file so that it won't need to be re-downloaded?

Previously, I have pointed out that many GWAS simply report A1/A2, not the strict REF/ALT. Once the reference genomes get bigger and bigger, the REF/ALT specification would become really indefinite and inconsistent. Therefore, I strongly suggest that pheweb would accommodate the use of A1/A2 and be able to map rsIDs with a better flexibility. Otherwise, most SNPs in a GWAS would have no mapped rsIDs after running pheweb. You guys only need to change one line (line 140), for add_rsids.py. Before: rsids = [rsid['rsid'] for rsid in rsid_group if cpra['ref'] == rsid['ref'] and are_match(cpra['alt'], rsid['alt'])] After: rsids = [rsid['rsid'] for rsid in rsid_group if (cpra['ref'] == rsid['ref'] and are_match(cpra['alt'], rsid['alt'])) or (cpra['ref'] == rsid['alt'] and are_match(cpra['alt'], rsid['ref']))]

Please kindly consider this.

Thank you & best regards, Jie

— Reply to this email directly, view it on GitHub https://github.com/statgen/pheweb/issues/217, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGSPCJMLLTFKU6FZBELGITYTVX4HAVCNFSM6AAAAABDJPBWHWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZTKNJSGQ4TGNQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jielab commented 4 months ago

Dear Peter:

Thanks!

I am not sure what “make a PR with that change” refers to.

Previously we discussed about this, at https://github.com/statgen/pheweb/issues/173#issuecomment-1581798702

It is critical that we match by CHR:POS:A1:A2, not just CHR:POS.

I simply propose that we match by alleles, not demanding that the REF allele must mean the reference allele from the wild monkeys.

One day when all monkeys and all humans are sequenced, we might have difficulty to tell who is “reference” vs. “alternative”.

So as long as we are talking about the same two alleles, we should allow users to use different notations such as “effect” vs. “non-effect”.

As you see from the code that I proposed, it is just one line. I fully tested it and it worked perfectly.

Best regards,

Jie

发件人: Peter VandeHaar @.> 发送时间: 2024年2月16日 0:20 收件人: statgen/pheweb @.> 抄送: Jie Huang @.>; Author @.> 主题: Re: [statgen/pheweb] outdated instruction "pheweb phenolist glob-files" (Issue #217)

Thanks for pointing out “glob” vs “glob-files”! I just fixed it.

For rsids, I think you’re right. Anyways, it’s a harmless change, right? If you make a PR with that change I’ll merge it. Or did you already make one? I can’t remember anymore. Actually, I think it would be even better to just match on chr:pos and ignore ref/alt.

On Wed, Feb 14, 2024 at 9:34 PM Jie Huang @. <mailto:@.> > wrote:

Hi, there:

I noticed pheweb phenolist glob-files on this instruction page https://github.com/statgen/pheweb/blob/master/etc/detailed-loading-instructions.md#configuration-options. However, it won't work. It should be just "glob", not "glob-files".

The first time when I run pheweb process, I think pheweb automatically download the huge dbsnp reference files. If I already have these huge dbsnp files downloaded for another software, how could I speficy its location in the config.py file so that it won't need to be re-downloaded?

Previously, I have pointed out that many GWAS simply report A1/A2, not the strict REF/ALT. Once the reference genomes get bigger and bigger, the REF/ALT specification would become really indefinite and inconsistent. Therefore, I strongly suggest that pheweb would accommodate the use of A1/A2 and be able to map rsIDs with a better flexibility. Otherwise, most SNPs in a GWAS would have no mapped rsIDs after running pheweb. You guys only need to change one line (line 140), for add_rsids.py. Before: rsids = [rsid['rsid'] for rsid in rsid_group if cpra['ref'] == rsid['ref'] and are_match(cpra['alt'], rsid['alt'])] After: rsids = [rsid['rsid'] for rsid in rsid_group if (cpra['ref'] == rsid['ref'] and are_match(cpra['alt'], rsid['alt'])) or (cpra['ref'] == rsid['alt'] and are_match(cpra['alt'], rsid['ref']))]

Please kindly consider this.

Thank you & best regards, Jie

— Reply to this email directly, view it on GitHub https://github.com/statgen/pheweb/issues/217, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGSPCJMLLTFKU6FZBELGITYTVX4HAVCNFSM6AAAAABDJPBWHWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZTKNJSGQ4TGNQ . You are receiving this because you are subscribed to this thread.Message ID: @. <mailto:@.> >

— Reply to this email directly, view it on GitHub https://github.com/statgen/pheweb/issues/217#issuecomment-1946466315 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AGNS6774HZLPCQIW2UO7Y7TYTYYS3AVCNFSM6AAAAABDJPBWHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBWGQ3DMMZRGU . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AGNS677HD6UKMZH7R7SN3I3YTYYS3A5CNFSM6AAAAABDJPBWHWWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTUAS4AW.gif Message ID: @. @.> >