Closed Juanmaria-rr closed 7 months ago
Inputs:
Transformations:
Output: The same as the current with the "score" field changed - no changes to the API should be required.
@Juanmaria-rr I've made the ETL changes, run locally and uploaded the parquet to this bucket: gs://open-targets-pre-data-releases/jhpis/output/etl/parquet/targetPrioritisation for verification. I replaced the field name from hasMouseKO
to MouseKOScore
- this will require a small FE change to pick that up.
I kept all your logic with one change which may not have been necessary for you either because of the way pyspark/python handles the effect scores or because of the way the data are organised, but I added a descending sort operation to the scores before running the harmonic sum. Based on this, I'm fairly sure this sort is required, but wanted to check it with you in case I'm introducing something wrong. Cheers!
leaving ticket open until data has been reviewed
Another relevant task for this issue (@carcruz) is:
Mouse KO
column from duability
to safety
in target prioritisation view before public releaseMouse KO
column will be renamed to Mouse models
@Juanmaria-rr please let me know if there is any change we should make in the column documentation (e.g. new scores)
Hi. I compared @jdhayhurst results with mine, and we got the same numbers, excepting "0". @jdhayhurst , I think there is a typo in those targets with 0, because they appear as "-0". Could you please fix it?
Thanks @Juanmaria-rr, well spotted. I will fix this in the ETL.
FE testing of new column is in progress
Below you can find the score distribution (in positive, before flip to negative):
Due to the distribution of the mouse phenotype scores, a relevant number of the values are in the range of being deep red.
We would like to transform the values so scores belonging to the lowest 25% (that is 0.60) would appear as 0, while the rest would be linearly transformed from 0 to -1 .
There is already code implemented in BE for some columns that could be used. For instance, the next code could be used (taken from mouse ortholog column):
.withColumn(
"mouseScores",
F.when(F.col("mousePhenotypeScore") <0.60,
F.lit(0))
.when( F.col("mousePhenotypeScore") >=0.60,
F.lit((F.col("mousePhenotypeScore") - 0.6) / 0.40)
)
)
Changes to ETL done and run locally. @Juanmaria-rr I've updated the files in the google bucket gs://open-targets-pre-data-releases/jhpis/output/etl/parquet/targetPrioritisation - please let me know if they look correct. Thanks!
I checked the values of the re-scaled score and seems to be good. This is the new distribution of the mouse phenotypes score:
Background
We need to incorporate in the Target Prioritisation view a new column to inform safety using mouse KO models. For that, we have used the mouse phenotypes reported on every KO models and classified them regarding their severity using the high level classification of Phenotype Classes. The scores are aggregated per target and a mouse Phenotype Score is built using the Harmonic Sum.
This dataset informs for more than 12.000 targets and shows some predictive ability for human Safety Liabilities.
Code and data availability:
The code for the new column is available in Target Engine repo: src/data_flow/target_properties_wb.py and the high level scores are in /src/data_flow/phenotypeScores/20230825_mousePheScores.csv .
Tasks