Open DSuveges opened 2 weeks ago
The code to generate the new dataset is merged and is now part of the L2G prediction step. The dataset:
gs://ot_orchestration/releases/24.10_freeze6/locus_to_gene_predictions_w_features
The annotation went fine. Same scores between the annotated and non annotated dataset, and same number of credible sets. No rows with null features. In terms of size, dataset has doubled from 600Mb to 1.3Gb.
The locus to gene prediction dataset is loaded to Platform. This dataset is used to populate credible set widget and locus to gene table. This table at this point only contains study locus id, gene id, and locus to gene score. This table needs to be enriched with the feature matrix. The column is expected to be a map type that allows the dynamic increase of the list of features without a required schema change.
Schema:
Example:
The column needs to be added at the l2g prediction step.