opencats / OpenCATS

Applicant Tracking System (maintained code base)
http://www.opencats.org
Other
535 stars 247 forks source link

Use mb_substr() for correct abbreviation of non-ASCII characters #651

Closed xalt7x closed 2 months ago

xalt7x commented 6 months ago

When using substr() or another method to reduce a string to/by 1 byte, many UTF-8 characters are lost (displayed as � ). Switching to mb_substr() fixes this.

xalt7x commented 6 months ago

The problem is easily reproducible with Cyrillic/Ukrainian characters (e.g., "Джон Дое" as the User/Owner name, or "Навички обслуговування клієнтів" string for "Key Skills").

fix_cyrillic_abbreviation

Additional information:

If you’re working with strings encoded as UTF-8 you may lose characters when you try to get a part of them using the PHP substr function. This happens because in UTF-8 characters are not restricted to one byte, they have variable length to match Unicode characters, between 1 and 4 bytes.

RussH commented 2 months ago

Thanks @xalt7x !