opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

PIS transformation extraction: `homologues` step #3482

Open javfg opened 3 days ago

javfg commented 3 days ago

Description

The homologues step writes files into two subfolders:

target-inputs/homologue/gene-dictionary and target-inputs/homologue/homologies

The files in the gene-dictionary folder are JSON files fetched from:

https://ftp.ensembl.org/pub/release-${ensembl_version}/json/${species}/${species}.json

with ${species} being: caenorhabditis_elegans, canis_lupus_familiaris, cavia_porcellus, danio_rerio, drosophila_melanogaster, macaca_mulatta, mus_musculus, oryctolagus_cuniculus, pan_troglodytes, rattus_norvegicus, sus_scrofa, xenopus_tropicalis, homo_sapiens.

Transformations PIS was doing

PIS was running:

jq -r '.genes[] | [.id, .name] | @tsv'

on the files, extracting those two fields and building a TSV file with them.

Tasks

d0choa commented 3 days ago

This is the function to apply the transformation: https://github.com/opentargets/platform-etl-backend/blob/11a1f67ce194d079603543f5f96b76c9963e35e8/src/main/scala/io/opentargets/etl/backend/target/Target.scala#L312