savonet / liquidsoap

Liquidsoap is a statically typed scripting general-purpose language with dedicated operators and backend for all thing media, streaming, file generation, automation, HTTP backend and more.
http://liquidsoap.info
GNU General Public License v2.0
1.39k stars 126 forks source link

Incorrect handling of non-ascii characters in regex #3824

Open vitoyucepi opened 5 months ago

vitoyucepi commented 5 months ago

Describe the bug Unicode characters could be split incorrectly when using Unicode character properties in regex.

To Reproduce

r/\PL+/.split("revolución")

Expected behavior There should be no splitting of the word, which would result in a wrong Unicode sequence.

Version details

Install method savonet/liquidsoap:v2.2.4

Common issues https://github.com/savonet/liquidsoap/discussions/3816#discussioncomment-8900996

toots commented 5 months ago

Unfortunately, the backend we use for regexp does not (yet?) have support for unicode regexp: https://github.com/ocaml/ocaml-re/issues/24