kellnerd / musicbrainz-scripts

Bookmarklets and Userscripts for MusicBrainz.org
MIT License
31 stars 2 forks source link

Punctuation: Incorrect substitutions at end of words on non-Latin titles #8

Closed ROpdebee closed 3 years ago

ROpdebee commented 3 years ago

I've added a new medium to the test.MB release to illustrate. I haven't looked into the regexes, but I guess it's not recognising Greek script as word characters, and therefore matches a ‘’ pair instead of two separate s.

Not sure whether this is straightforward to fix. If it's not, then feel free to just close this with a wontfix, as this is pretty rare. A quick replica DB query gives maybe 15 releases with Greek script whose title might be affected by this bug.

kellnerd commented 3 years ago

I had not looked into non-latin scripts so far because I knew the solution would be a somewhat ugly regex, but it looks like substituting \W by [^\p{L}\d] would do the job (matching characters which are neither letters in any script nor digits). This would require ES2018 support but since I have not put compatibility above functionality and clean code so far, I guess I won't care about old browser versions this time either. In any case, I think this is the right time to finally set up some tests for the regexes before I will introduce this potentially breaking change (e.g. simply \P{L} would also match digits which \W did not).

ROpdebee commented 3 years ago

That seems to have done the trick, I just came across another Greek release and the apostrophes were converted correctly. Thanks!