opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Implement ETL logic to export cooccurrences for EuroPMC #3111

Closed DSuveges closed 3 months ago

DSuveges commented 9 months ago

As part of the partnership between OT and EuroPMC, we are feeding back normalized cooccurrences to EuroPMC as part of the release process. Based on a meeting we had on 28th September we agreed upon a schema (see notes: here).

I have been implemented the first prototype in PySpark, which was used to generate a draft version of the data. This piece of data was then fed back to EuroPMC, who could confirm that the data looked OK and is ready to be ingested.

The pyspark implementation of the logic is here: gist

important: The above implementation doesn't account for the required maximum number of rows (10k) in the resulting partitions.

Tasks

mbdebian commented 3 months ago

@remo87 checking on this one with @DSuveges Thanks!

remo87 commented 3 months ago

I'm closing this item because it's not longer needed due to changes being done in the EPMC side.