Closed ROpdebee closed 3 years ago
I had not looked into non-latin scripts so far because I knew the solution would be a somewhat ugly regex, but it looks like substituting \W
by [^\p{L}\d]
would do the job (matching characters which are neither letters in any script nor digits).
This would require ES2018 support but since I have not put compatibility above functionality and clean code so far, I guess I won't care about old browser versions this time either.
In any case, I think this is the right time to finally set up some tests for the regexes before I will introduce this potentially breaking change (e.g. simply \P{L}
would also match digits which \W
did not).
That seems to have done the trick, I just came across another Greek release and the apostrophes were converted correctly. Thanks!
I've added a new medium to the test.MB release to illustrate. I haven't looked into the regexes, but I guess it's not recognising Greek script as word characters, and therefore matches a
‘’
pair instead of two separate’
s.Not sure whether this is straightforward to fix. If it's not, then feel free to just close this with a
wontfix
, as this is pretty rare. A quick replica DB query gives maybe 15 releases with Greek script whose title might be affected by this bug.