TRegex: Dumped automata transition labels unclear in some cases

nirvdrum commented 2 years ago

Describe GraalVM and your environment :

GraalVM version or commit id if built from source: 22.1.0
CE or EE: CE
JDK version: 17
OS and OS Version: macOS 12.3.1
Architecture: amd64 via Rosetta

Describe the issue TRegex has the ability to dump the automata corresponding to a regex. This is an amazingly useful way to gain insight into what the engine is doing and can help in debugging. However, I find the transition labels to be rather difficult to read in some cases. For example, working with the Ruby regex /a?a?aa/, some of the DFA transitions are labeled as [x00-x60b-x7f].

The primary issues I have reading it are:

The code point ranges aren't separated by a character, so they blend together
Some values are displayed as hexadecimal while others are the printed character

The transition labeled [x00-x60b-x7f] would be a lot clearer to me if it were presented as [x00-x60,x62-x7f]. I appreciate there may be some difficulty in using a delimiter if you also print literal comma characters as part of the set. But, since "b" is a hexadecimal character and code points can be multiple bytes, I was rather confused by x60b. Additionally, since three out of the four range bounds are presented in hexadecimal, showing a literal b character is not ideal. If it were presented as x62, it'd be immediately clear that this range represents every ASCII character except for x60. When presented as b, I need to consult an ASCII table separately to really understand what the range is.

Code snippet or code repository that reproduces the issue

jt ruby -e 'p 100_000.times { /a?a?aa/.match?("aaa") }'

Unfortunately, to actually get the output you'll need to modify TruffleRuby to add the TRegex DumpAutomata option. I'm going to add a new option to TruffleRuby to handle that without having to recompile TruffleRuby.

nirvdrum commented 2 years ago

/cc @djoooooe

oubidar-Abderrahim commented 2 years ago

Hi, Thank you for reporting this, we will take a look into it and get back to you

oubidar-Abderrahim commented 2 years ago

Tracked internally on GR--38769

oracle / graal

TRegex: Dumped automata transition labels unclear in some cases #4569