Closed jielab closed 5 months ago
Do you mean that internally pheweb should store everything in tabixed bgzipped GWAS-VCF instead of the current tabixed bgzipped tsv files? Why? How would that make queries more efficient?
Or do you just want to use GWAS-VCF as input to create a pheweb? It should be easy to write a script that converts GWAS-VCF into the input format pheweb requires. Do you have one file per phenotype, or many phenotypes in a single file?
Dear Peter:
I mean the latter, pheweb to use GWAS-VCF as input. As you know, these GWAS files with millions of rows are huge. It is very confusing and headache that each software needs different columns and column names. I think we should use VCF's capacity for fast query, which comes with a vcf.tbi file.
I hope that you have a few minutes to read this paper https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02248-0, and agree that supporting VCF format is a good idea.
Best regards, Jie
Just wanted to chime in that MungeSumstats
might be helpful here:
thank you veyr much!
best regards, Jie
Hi, Guys:
these days, GWAS files have up to 20 million rows, really very inefficient to query and process, if stored simply as a TXT file.
I think the VCF format is a really good idea, as explained here https://github.com/MRCIEU/gwas2vcf.
Don't know if there is a way to support VCF format for Pheweb.
Best regards, Jie