We want to create a version of the staging data that complies with the following rules for a gentropy pipeline run.
This is a data-patching exercise with the intention to produce a clean run of the pipeline and identify further actions in the implementation of the logic:
All gwas_catalog_PICSed_curated_associations need to have in the qualityControls column the StudyLocusQualityCheck.TOP_HIT
All gwas_catalog_susie_summary_statistics, eqtl_catalogue_susie, finngen_r11_susie and ukb_ppp_eur_susie need to have the columns locusStart and locusEnd populated derived from the region column
All gwas_catalog_susie_summary_statistics and ukb_ppp_eur_susie need a new flag to be added StudyLocusQualityCheck.OUT_OF_SAMPLE_LD
@DSuveges, @addramir I don't think I'm missing anything but have a quick read through
The moment this data is ready we will want to rerun the DAG (@project-defiant)
We want to create a version of the staging data that complies with the following rules for a gentropy pipeline run.
This is a data-patching exercise with the intention to produce a clean run of the pipeline and identify further actions in the implementation of the logic:
The next are all the staging inputs:
Requirements:
StudyLocusId
need to be inString
formatgwas_catalog_PICSed_curated_associations
need to have in thequalityControls
column theStudyLocusQualityCheck.TOP_HIT
gwas_catalog_susie_summary_statistics
,eqtl_catalogue_susie
,finngen_r11_susie
andukb_ppp_eur_susie
need to have the columnslocusStart
andlocusEnd
populated derived from theregion
columngwas_catalog_susie_summary_statistics
andukb_ppp_eur_susie
need a new flag to be addedStudyLocusQualityCheck.OUT_OF_SAMPLE_LD
@DSuveges, @addramir I don't think I'm missing anything but have a quick read through
The moment this data is ready we will want to rerun the DAG (@project-defiant)