Closed cgivre closed 7 years ago
Hi,
There is already a ticket for that at the Drill side: https://issues.apache.org/jira/browse/DRILL-3423 As far as I remember there was something in the integration of the two that was very hard to do in Drill. Please check that ticket and have a look at the discussion that took place there.
Niels Basjes
In addition: Please realize that this parser is pluggable. So the fields it can extract is not a 'static thing'. Also when for example extracting cookies and query string parameters a 'column' can be defined for each possible parameter name you can think of. As a consequence : There is nu such list of fields.
Hi Niels, Jim Scott was working on it, but got stuck and then didn’t have time to continue. He got it 95% of the way there. I’ve been working on it and the issue that the Drill committers were having is that the field names (HTTP.USERAGENT.user-agent) were not “drill-friendly”. So I’ve made some changes from the Drill side whereby Drill removes the data-type from the user’s view. However, it still needs to map that back to your parser. I hope that makes sense. In other words, we wanted it so that a user could just type:
SELECT request_user-agent
FROM
instead of:
SELECT HTTP_USERAGENT:request_user-agent
FROM
Likewise for the results that are returned. In order to do that, it’s obviously trivial to remove the data-type, but adding it back for the parser, requires a mapping of some sort. That’s why I was asking. Does that make sense? Thanks, — Charles
On Sep 20, 2016, at 10:55, Niels Basjes notifications@github.com wrote:
Hi,
There is already a ticket for that at the Drill side: https://issues.apache.org/jira/browse/DRILL-3423 https://issues.apache.org/jira/browse/DRILL-3423 As far as I remember there was something in the integration of the two that was very hard to do in Drill. Please check that ticket and have a look at the discussion that took place there.
Niels Basjes
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nielsbasjes/logparser/issues/34#issuecomment-248326489, or mute the thread https://github.com/notifications/unsubscribe-auth/AFQfvsl5RFaK0JTYT3MgsR_mNNgZHKLiks5qr_P0gaJpZM4KBsE1.
So what if my parser gets an additional option to allow the fields to be encoded (for example URL encoded , or something special) to ensure a readable variant that does not have any 'funny' characters.
So my question to you is: What characters are considered to be 'normal' for column names in Drill? I assume at least [a-zA-Z0-9_]
Is there something regarding this ticket I can do for you at this time?
BTW, the parser works great with Drill!
Great to hear. Don't forget I released v3.0 with support for Nginx.
Hi Niels, I'm working on adapting your parser for Apache Drill and I was wondering if there is a list somewhere of the fields that the parsers supports and the data types? Thanks,