teragrep / pth_10

Data Processing Language (DPL) translator for Apache Spark
GNU Affero General Public License v3.0
0 stars 6 forks source link

rex4j command does not allow underscores in capture group names #211

Open eemhu opened 7 months ago

eemhu commented 7 months ago

Describe the bug If using a capture group name with an underscore _ in it, the command will fail with error message:

java.lang.IllegalArgumentException: Error in rex4j command, regexp-string missing mandatory match groups.
    at com.teragrep.pth10.steps.rex4j.Rex4jStep.get(Rex4jStep.java:102)

Expected behavior Underscore should be an allowed character.

How to reproduce Enter a regex string containing a capture group with an underscore in it. | makeresults | eval _raw = "hello, world, city, town, village" | rex4j "(?<under_score>h.*o)\,\s\world.*"

Screenshots

Software version pth_10 4.17.0

Desktop (please complete the following information if relevant):

Additional context Might affect other special characters as well.

eemhu commented 6 months ago

Underscores are not supported in group names by Java regex.

Group name

A capturing group can also be assigned a "name", a named-capturing group, and then be back-referenced later by the "name". Group names are composed of the following characters. The first character must be a letter.

    The uppercase letters 'A' through 'Z' ('\u0041' through '\u005a'),
    The lowercase letters 'a' through 'z' ('\u0061' through '\u007a'),
    The digits '0' through '9' ('\u0030' through '\u0039'), 

https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html