w3c / qt3tests

Tests for XPath and XQuery
27 stars 17 forks source link

fn/matches.re.xml: re00984 unicode-version #6

Open zadean opened 5 years ago

zadean commented 5 years ago

Test re00984 tests a large number of code-points for the \w character sequence. Characters ⌈ and ⌉ are in this list. These codepoints were moved from \p{S} to \p{P} in unicode version 6.3, and therefore out of the \w character sequence.

Perhaps the test should include the "unicode-version" dependency flag for version "6.2"?

michaelhkay commented 5 years ago

It would be a shame to put that dependency on the whole test - better to move the relevant part into a separate test with a dependency.

Michael Kay

On 19 Aug 2019, at 18:20, Zachary Dean notifications@github.com wrote:

Test re00984 tests a large number of code-points for the \w character sequence. Characters ⌈ and ⌉ are in this list. These codepoints were moved from \p{S} to \p{P} in unicode version 6.3, and therefore out of the \w character sequence.

Perhaps the test should include the "unicode-version" dependency flag for version "6.2"?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/qt3tests/issues/6?email_source=notifications&email_token=AASIQIU2NDBFSFJP7XPA2NLQFLI6LA5CNFSM4IND6ULKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HGBKPQA, or mute the thread https://github.com/notifications/unsubscribe-auth/AASIQIUHFAAG2QFROF6EZ7DQFLI6LANCNFSM4IND6ULA.

zadean commented 5 years ago

@michaelhkay You make a very good point, and a separate test for the reclassified characters is definitely the better answer.

I took a quick look through the notes for the unicode updates since 6.3 and only found a few more category changes, but none that seem to break things in the current test suite as it stands.

Just a side note: It may also be of interest to "modernize" a bit by adding some of the new emoji/emoticon codepoints to the \p{So} tests (re00169 & re00207). I imagine they will are showing up in real data and adding them would add value to the test cases. Not that this suite is a unicode test-suite, but just a few to show some level of compliance for the newer characters. But that is something for a different issue.