swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.58k stars 10.35k forks source link

Unicode Regex fails to match using case-insensitive option #59953

Closed samkrishna closed 2 years ago

samkrishna commented 2 years ago

Describe the bug Caseless Unicode regexes fail to match Unicode strings with varying cases.

Steps To Reproduce Steps to reproduce the behavior:

  1. Unzip the attached UnicodeRegexes.zip iOS framework project
  2. Open the UnicodeRegexesTests.swift test case file
  3. Navigate to the testRegexCaselessBugExample test case
  4. Run the tests using β€œCommand-U”
  5. Wait for the test failure on lines 49 and 50

Expected behavior

I expect the case-insensitive regexes for matching "𝛣𝛲𝛒𝛢𝛰" (Greek upper-case) and "π›½πœŒπ›‚πœπœŠ" (Greek lower-case) spelling out the Greek equivalent characters of the word "bravo" to match (on lines 49 and 50).

Screenshots N/A

Environment (please fill out the following information)

Additional context

This is related to Apple FB5706701, the Objective-C version of this bug in NSRegularExpression. I have included the original Objective-C test code in commented form in the test case. UnicodeRegexes.zip

paiv commented 2 years ago

"𝛣𝛲𝛒𝛢𝛰" (Greek upper-case)

These are MATHEMATICAL ITALIC CAPITAL letters, and Unicode intentionally does not provide case folding for them.

You can match the whole block 1D400 ... 1D7FF, or specific subranges.

Or use simple greek letters, like ΒΑΑΞ₯Ο, βραυο

samkrishna commented 2 years ago

@paiv I just ran the tests with your suggestion and of course you're correct.

Thank you for the Unicode lesson. This was a bug report with an incorrect premise and I am closing it now.