Closed ireneisdoomed closed 1 year ago
FYI, I fixed the bug that caused the pipeline to crash on invalid input ('\ufeff79254949') from a small number of studies. (https://github.com/opentargets/genetics-v2d-data/commit/d58d0d4c6c7e6822fe566ee129890ba82045f2b7) I informed GWAS catalog of the studies with the data problem.
Re: finemapping location, this has also been updated. (And in general would need to be updated when doing each release.)
Regarding the LD table, I think that the best enhancement would be to get rid of PICS altogether. Then you don't need any LD. There is very little benefit to the PICS method, since it assumes a single causal variant. You might as well just do standard WCCC-style approximate Bayes factor fine-mapping, which wouldn't require an LD panel and so would be much more robust, and would be much faster.
@DSuveges shall we close this one?
Not the entire v2d is implemented so far, but we are so out of scope of this ticket, we should close.
This is an attempt to collect all the Genetics Variant To Disease Pipeline steps to have an overall description of how it works and to identify points to work on to improve the overall functioning of the pipeline.
2 overall notes:
Workflow DAG
Graph of dependencies of the different scripts that are called on the pipeline.
Workflow description
The whole pipeline consists on the creation of 4 files: study, fine mapping, LD and top loci tables.
Top loci table
GWASCat:
ValueError: invalid literal for int() with base 10: '\ufeff79254949'
~ Fixed by @Jeremy37Summary statistics:
'gs://genetics-portal-dev-staging/finemapping/merged_210515/top_loci.json.gz'
, is this the most up to date data? Yes, as noted by @Jeremy37, config is updated per release.Study table - TBC
GWASCat:
Fine mapping table
'gs://genetics-portal-dev-staging/finemapping/merged_210515/credset/_SUCCESS'
, is this the most up to date data? Yes, as noted by @Jeremy37, config is updated per release.download_credible_set_directory
has some weird logic to download the credible set, apparently the problem is that it is a directory. This should be handled with no problem.LD table
chrom:pos:ref:alt
form, implement this step incalculate_r_using_plink
and use Spark.