Closed hsivonen closed 4 years ago
Is there any performance/memory benchmark that takes regex
against irrexep
or other popular reggexp engines?
I think there are two potential reasons for developing icu4x regex:
Now, I don't know if Rust regex crate already offers this. If so, we could just fallback to it, without developing our own, if licencing is not a problem.
пет, 17. апр 2020. у 01:33 Zibi Braniecki notifications@github.com је написао/ла:
Is there any performance/memory benchmark that takes regex against irrexep or other popular reggexp engines?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/unicode-org/omnicu/issues/37#issuecomment-615118516, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7GEKVSNHCW2SPUJ7OU4DTRNAH53ANCNFSM4MKRXKNQ .
if licencing is not a problem.
According to the latest I saw, it shouldn't be!
It seems it supports some level of Unicode algo - https://github.com/rust-lang/regex/blob/master/UNICODE.md
Which brings another question - what to do with Unicode properties. Can they be shared across crates?
пет, 17. апр 2020. у 10:32 Zibi Braniecki notifications@github.com је написао/ла:
if licencing is not a problem.
According to the latest I saw, it shouldn't be!
https://github.com/rust-lang/regex#license
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/unicode-org/omnicu/issues/37#issuecomment-615372449, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7GEKVALZQC6IZUJELQHFTRNCHBZANCNFSM4MKRXKNQ .
I just looked over the description on https://github.com/rust-lang/regex/blob/master/UNICODE.md. The support is, in general, quite good.
The main area where it falls down is in support of more Unicode properties. A second question I have is how good Rust is about updating to the newest version of Unicode, and whether there is an API in Rust to detect the version of Unicode supported.
Mark
On Fri, Apr 17, 2020 at 11:42 AM Nebojša Ćirić notifications@github.com wrote:
It seems it supports some level of Unicode algo - https://github.com/rust-lang/regex/blob/master/UNICODE.md
Which brings another question - what to do with Unicode properties. Can they be shared across crates?
пет, 17. апр 2020. у 10:32 Zibi Braniecki notifications@github.com је написао/ла:
if licencing is not a problem.
According to the latest I saw, it shouldn't be!
https://github.com/rust-lang/regex#license
— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/unicode-org/omnicu/issues/37#issuecomment-615372449 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AA7GEKVALZQC6IZUJELQHFTRNCHBZANCNFSM4MKRXKNQ
.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/unicode-org/omnicu/issues/37#issuecomment-615403384, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMAMWC35IG24CBZL7RTRNCPH5ANCNFSM4MKRXKNQ .
I'll take this issue and add a note about this to ecosystem.md. I plan to send a PR to that doc with a new column saying to what degree we want to pull in existing code from each crate.
I added this to #41 and am documenting that we don't intend to take action on regex support in ICU4X at this time.
ecosystem.md mentions
icu::Regex
. The Rustregex
crate already exists and is very performant (in part due to not supporting some Perl-popularized features that aren't actually regular and hinder performance).It might be useful to signal intent in this area at some point.
Does the project seek to provide regular expressions that operate on UTF-8 for Rust apps? If so, what would be the elevator pitch relative to the
regex
crate?Does the project seek to provide regular expressions that operate on UTF-16 and Latin1 and conform to ECMAScript regular expressions for use in JavaScript engines? If so, what would be the elevator pitch relative to what SpiderMonkey and V8 already have?
Does the project seek to provide regular expressions that Dart or Go programs would use? If so, what would be the elevator pitch relative to what the standard libraries of these languages provide?
Does the project seek to provide regular expressions that C or C++ apps would use via FFI? If so, would this just FFI around the
regex
crate (i.e. UTF-8), something new, or for UTF-16?