opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

pharmacovilance data: crohn’s vs chron^s #778

Closed deniseOme closed 4 years ago

deniseOme commented 4 years ago

the new pharmacovigilance data has some minor issues with denoting diseases e.g. Crohn^s disease” = “Crohn’s disease”

https://qa.targetvalidation.org/summary?drug=CHEMBL1201580

This means that we report those two as two different adverse events. This will have an impact on the significance that is computed (log likelihood ratio and the reported critical value threshold). This does not seem to be widespread to other diseases, with drugs in phase IV, that have an apostrophe e.g. Hansen’s, Hodgkin’s, and “Raynaud’s” etc...

Screen Shot 2019-11-13 at 17 19 17

mkarmona commented 4 years ago

@deniseOme this should fix this and any other possible same problem. If you catch any other typo please reopen it and highlight the character that causes the typo

@ val df = List("crohn's", "crohn^s").toDF("name") 
df: DataFrame = [name: string]

@ df.selectExpr("translate(name,'^','\\'')").show() 
+---------------------+
|translate(name, ^, ')|
+---------------------+
|              crohn's|
|              crohn's|
+---------------------+