Open kmerz opened 5 years ago
Here a diff you can apply which tests the missing underscore support:
diff --git a/src/main/resources/patterns/patterns b/src/main/resources/patterns/patterns
index 52235d8..2eb7233 100644
--- a/src/main/resources/patterns/patterns
+++ b/src/main/resources/patterns/patterns
@@ -106,3 +106,6 @@ COMMONAPACHELOG_DATATYPED %{IPORHOST:clientip} %{USER:ident;boolean} %{USER:auth
# Log Levels
LOGLEVEL ([A|a]lert|ALERT|[T|t]race|TRACE|[D|d]ebug|DEBUG|[N|n]otice|NOTICE|[I|i]nfo|INFO|[W|w]arn?(?:ing)?|WARN?(?:ING)?|[E|e]rr?(?:or)?|ERR?(?:OR)?|[C|c]rit?(?:ical)?|CRIT?(?:ICAL)?|[F|f]atal|FATAL|[S|s]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)
+
+# NamedGroup with underscore
+NAMEDGROUPWITHUNDERSCORE (?<test_field>test)
diff --git a/src/test/java/io/krakens/grok/api/GrokTest.java b/src/test/java/io/krakens/grok/api/GrokTest.java
index de4d714..3bff5a8 100644
--- a/src/test/java/io/krakens/grok/api/GrokTest.java
+++ b/src/test/java/io/krakens/grok/api/GrokTest.java
@@ -672,4 +672,13 @@ public class GrokTest {
instant = (Instant) grok.match(dateWithTimeZone).capture().get("timestamp");
assertEquals(ZonedDateTime.parse(dateWithTimeZone, dtf.withZone(ZoneOffset.ofHours(8))).toInstant(), instant);
}
+
+ @Test
+ public void testNamedGroupWithUnderscore() {
+ String grokPatternName = "NAMEDGROUPWITHUNDERSCORE";
+ String testString = "test";
+ Grok grok = compiler.compile("%{" + grokPatternName + "}");
+ String result = (String) grok.match(testString).capture().get(grokPatternName);
+ assertEquals("test", result);
+ }
}
The fact that this broke _ support was known at the time of the change and referenced in #53, so this was on purpose and accepted as far as I can tell. I don't think you have to convince anyone this is an issue.
What you would have to convince people of is that the original PR ( 2016?) was not worth dropping this support.
For us in Graylog it is simply the fact that we have users with Grok Patterns which do contain named groups with underscores. And dropping the support means a lot of work for the users, manually rewriting there Grok Patterns.
So, from what I can tell that support was lost because the library that provides it is for java 5/6, so it just can't be 'added back'. Maybe you can submit a PR? But there might be other reasons.....
@anthonycorbacho ?
I would have a PR ready when you want. But as I read in #53 performance with named-regexp
was an issue?
So we really need @anthonycorbacho to comment. I was not involved at that time, and I am not a committer now, although I have submitted a couple of PR etc. Switching the regex engine, as I believe this change did was and is a big deal, and as I said they knew they were making this breaking change and accepted it.
I don't think there is any going back.
I would imagine if you wrote something that say did mapping between and some allowable character on the fly ( so that the caller can use but inside it is replaced ) that would be the way to go.
But @anthonycorbacho leads the project.
With #53 we lost the ability to have named group captures with underscore like
(?<test_field>test)
.java-grok
had the support as long as we used thecom.google.code.regexp.Pattern
. Now withjava.util.regex.Pattern
we use the java regex engine which does not support underscores:https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#groupname
This broke backward compatibility with already stored patterns.
Preferable fix was to bring back
com.google.code.regexp.Pattern
.