moodmosaic / Fare

Port of Java dk.brics.automaton and xeger, mostly used for generating strings that match a specific regular expression.
http://www.brics.dk/automaton/
MIT License
182 stars 43 forks source link

System.InvalidOperationException: state #44

Open rawwool opened 6 years ago

rawwool commented 6 years ago

Xeger throws System.InvalidOperationException: state when trying to generate a string for this regular expression for emails: ^(?=.{6,50}$)([\w-.]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([\w-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)$

moodmosaic commented 6 years ago

Thank you for reporting this!

Project Fare turns Regular Expressions into Automatons by applying the algorithms of dk.brics.automaton and xeger.

Unfortunately, I don't have an answer to your question, as Project Fare is really a port of the above Java projects. – We'd have to try

^(?=.{6,50}$)([\w-.]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([\w-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)$

in Java and compare the results.

You may use a different pattern or use a different engine to reverse the Regular Expression into an Automaton. As an example, you can use the Rex engine.

gukoff commented 3 months ago

The problem is this part: [\w-.].

Xeger interprets [\w-.] as a range from w to ., like with [A-Za-z].

Change it to [\w\-.] or [\w.-], and it will work.

moodmosaic commented 3 months ago

@gukoff, thank you. PRs more than welcome. (In this case, I think a possible PR would be a test-case demonstrating this, but still, it can be valuable.)

gukoff commented 3 months ago

The current behaviour is correct, see the docs:

Because a positive character group can include both a set of characters and a character range, a hyphen character (-) is always interpreted as the range separator unless it is the first or last character of the group.

Let me check if I could improve the error message.