opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Text highlighting issue on evidence page for some of the EPMC evidence (character-encoding issue) #158

Closed gkos-bio closed 5 years ago

gkos-bio commented 6 years ago

This is most probably an issue upstream with the data submission. Example: https://www.targetvalidation.org/evidence/ENSG00000133703/EFO_0000096 Title of the abstract: Circulating tumour DNA sequence analysis as an alternative to multiple myeloma bone marrow aspirates. Sentence not highlighted: "We report here a hybrid-capture-based Liquid Biopsy Sequencing (LB-Seq) method used to sequence all protein-coding exons of KRAS, NRAS, BRAF, EGFR and PIK3CA in 64 cfDNA specimens from 53 myeloma patients to >20,000 × median coverage.:

Please note the '×' returning by the Open Targets API. In the original EMPC article this will encode for a times character (unicode × ).

The most likely explanation is the way the information is extracted by EMPC to produce the evidence strings. It may be that it's not unicode-encoded. The other explanation would be that the data are correctly encoded in the original submission but the data pipeline doesn't handle this properly.

afaulconbridge commented 6 years ago

I've already updated the pipeline to handle UTF-8 characters better - see opentargets/data_pipeline#322 We should check if this is still a problem with the 18.10 release.