ropensci / coder

Classification of Cases into Deterministic Categories
https://docs.ropensci.org/coder/
22 stars 4 forks source link

?! doesn't work #153

Open samlipworth opened 2 years ago

samlipworth commented 2 years ago

Trying to code a custom scheme (diabetes complication severity) - trying to use ?! as not followed by - the visualise command recognises this correctly but then I get an error when trying to run - any ideas?

Classification based on: icd10 Error in grepl(paste(cc[[attr(cc, "regexpr")]], collapse = "|"), codified$code) : invalid regular expression '^(E08(31|32|33|36|37|38|39)|E09(31|32|33|36|37|38|39)|E10(31|32|33|36|37|38|39)|E11(31|32|33|36|37|38|39)|E13(31|32|33|36|37|38|39))|^(H350)|^(H3535)|^(H35(6|8|9))|^(H33)|^(E08(34|35)|E09(34|35)|E10(34|35)|E11(34|35)|E13(34|35))|^(H54)|^(H431)|^(E08(21|22|29)|E09(21|22|29)|E10(21|22|29)|E11(21|22|29)|E13(21|22|29))|^(N00)|^(N04)|^(N03)|^(N05)|^(N18(1|2|3|9))|^(N18(4|5|6))|^(N19)|^(E08(4)|E09(4)|E10(4)|E11(4)|E13(4))|^(G90(09|8|9)|G99)|^(G56)|^(G57)|^(G609)|^(G733)|^(G9001)|^(H49)|^(I951)|^(K3184)|^(K591)|^(N319)|^(M316)|^(S04)|^(G45)|^(I6(1|3|5|6)|I6781)|^(I2(4|0|5(?!2)))', reason 'Invalid regexp' In addition: Warning message: In grepl(paste(cc[[attr(cc, "regexpr")]], collapse = "|"), codified$code) : TRE pattern compilation error 'Invalid regexp'

samlipworth commented 2 years ago

e.g. group,icd10,dcsi CV_IHD,I2(4|0|5(?!2)),1

samlipworth commented 2 years ago

looks like this is a wider problem - R regex doesn't do lookaheads - could add an option to turn on perl? Workaround for now is to filter out any excluded codes before running CodeR.

eribul commented 2 years ago

Thank you very much for your report! You are right that perl is turned off (as stated in the details section of ?classcodes). This is unfortunate! I remember that this was not true initially and that many of the included regex:s included \\d and \\w etc. For some reasons this didn't work. I remember that I did try to fix this but didn't succed. Unfortenately, I did not document this. In the end, I just decided to turn off perl as a quick and dirty solution which worked. I am currently on parental leave with limited possibilities to investigate this further. But please feel free to submit a pull request if you would have the time!