rust-lang / regex

An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.
https://docs.rs/regex
Apache License 2.0
3.42k stars 423 forks source link

FST transducer support #1087

Open eharcevs opened 10 months ago

eharcevs commented 10 months ago

Describe your feature request

I'm trying to upgrade from regex-automata 0.1.9 and have noticed that newer versions don't have a transducer feature anymore, which allowed using a regex DFA as an FST Automaton. Is there a plan to bing this feature back? If not what could be the workaround for using regex for fst? Thanks in advance

BurntSushi commented 10 months ago

Yes, I just haven't gotten around to it yet. Note that you can always write the trait impls yourself for a wrapper type, so you can unblock yourself if necessary. It could be some time before I get to it.

I will probably add the impls in a new crate or as an optional feature on fst itself. I haven't decided yet.

eharcevs commented 10 months ago

Got it, thanks!

BurntSushi commented 10 months ago

We can leave this open to track when it's done. Thanks!

tesioai commented 3 months ago

please Mr Sushi, deliver us from version 0.1.10

tisonkun commented 3 months ago

FYI I implement it in the wrapper pattern at - https://github.com/GreptimeTeam/greptimedb/pull/3575

I feel that the implementation is possible to contribute back upstream so that we get it maintained and evolved with the upstream, as well as porting it back can possibly discover some issues that should be fixed.

I have some time to make a patch, but here first to ask if it's the way to go.

BurntSushi commented 3 months ago

Sorry, but I'm not going to have time to review this any time soon unfortunately. Submitting a patch probably isn't fruitful at this time, because it's going to require some guidance from me probably in how the crates should be structured. For example, I don't think I want to take a dependency on fst inside of regex-automata. I think I want to flip that around so that fst has an optional dependency on regex-automata. And the same for aho-corasick: https://github.com/BurntSushi/aho-corasick/blob/56256dca1bcd2365fd1dc987c1c06195429a2e2c/Cargo.toml#L32-L48

BurntSushi commented 3 months ago

Note that for aho-corasick, I actually wrote out the implementation for it with tests, but it's not currently included in the crate: https://github.com/BurntSushi/aho-corasick/blob/56256dca1bcd2365fd1dc987c1c06195429a2e2c/src/transducer.rs

The unanchored/anchored API layer (and perhaps something more sophisticated) will be needed for regex-automata I think. But I haven't given it a ton of thought yet.

tisonkun commented 3 months ago

Thanks for your feedback! I won't over commit it so I'll stop here. If I happen to have time make the change on the fst crate, I'll comment there.

Then it seems we can close this issue as won't do in this crate and perhaps create a mirror in fst?

BurntSushi commented 3 months ago

I think I'm fine leaving this issue open. If you want to open one on fst too, that's fine.