oracle / graal

GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
https://www.graalvm.org
Other
20.34k stars 1.63k forks source link

TRegex: Dumped automata transition labels unclear in some cases #4569

Open nirvdrum opened 2 years ago

nirvdrum commented 2 years ago

Describe GraalVM and your environment :

Describe the issue TRegex has the ability to dump the automata corresponding to a regex. This is an amazingly useful way to gain insight into what the engine is doing and can help in debugging. However, I find the transition labels to be rather difficult to read in some cases. For example, working with the Ruby regex /a?a?aa/, some of the DFA transitions are labeled as [x00-x60b-x7f].

The primary issues I have reading it are:

The transition labeled [x00-x60b-x7f] would be a lot clearer to me if it were presented as [x00-x60,x62-x7f]. I appreciate there may be some difficulty in using a delimiter if you also print literal comma characters as part of the set. But, since "b" is a hexadecimal character and code points can be multiple bytes, I was rather confused by x60b. Additionally, since three out of the four range bounds are presented in hexadecimal, showing a literal b character is not ideal. If it were presented as x62, it'd be immediately clear that this range represents every ASCII character except for x60. When presented as b, I need to consult an ASCII table separately to really understand what the range is.

Code snippet or code repository that reproduces the issue

jt ruby -e 'p 100_000.times { /a?a?aa/.match?("aaa") }'

Unfortunately, to actually get the output you'll need to modify TruffleRuby to add the TRegex DumpAutomata option. I'm going to add a new option to TruffleRuby to handle that without having to recompile TruffleRuby.

nirvdrum commented 2 years ago

/cc @djoooooe

oubidar-Abderrahim commented 2 years ago

Hi, Thank you for reporting this, we will take a look into it and get back to you

oubidar-Abderrahim commented 2 years ago

Tracked internally on GR--38769