ohdsi-studies / PioneerWatchfulWaiting

This study is part of the joint PIONEER - EHDEN - OHDSI studyathon in March 2021, and aims to advance understanding of clinical management and outcomes of watchful waiting in prostate cancer.
Apache License 2.0
7 stars 18 forks source link

Strange character  appearing in cohort diagnostics cohort.csv file #17

Closed keesvanbochove closed 3 years ago

keesvanbochove commented 3 years ago

For some of the diagnostics results we got back after running the package, the name of a concept set defined in the JSON cohort specification in the package as ""name"": ""[PIONEER]  Biopsy"" gets altered to ""name"": ""[PIONEER]  Biopsy"" in the results. (see e.g. here: https://github.com/ohdsi-studies/PioneerWatchfulWaiting/blob/fff6fd9ba627e5aa027eb8079adcafbb90a6de2e/inst/cohorts/101.json#L28)

Peter Prinsen: "there seems to be a NO-BREAK SPACE at that location (C2A0 in UTF-8 bytes). It's also in the file when I download the package directly from github as a zip file. Could it be that the issue is in the original file but that for some reason it shows up for some and not others? Maybe on some systems this is converted to a regular space?"

Tim Hulsen: "Probably a character encoding issue, you need to set it to UTF-8. https://www.princexml.com/forum/topic/1819/non-breaking-spaces-are-treated-like-a-with-circumflex"

MaximMoinat commented 3 years ago

Very likely fixed with #18

keesvanbochove commented 3 years ago

Fixed by updating the source concept definitions in ATLAS (remove space) - but would be good to have confirmation from affected data sources.

keesvanbochove commented 3 years ago

This problem is confirmed fixed now, got correct results from NCR.