Open irvintim opened 1 year ago
For what it's worth
parse_regex!(.message, r'(?P<date>[^\t]+)\t(?P<time>[^\t]+)\t(?P<x_edge_location>[^\t]+)\t(?P<sc_bytes>[^\t]+)\t(?P<c_ip>[^\t]+)\t(?P<cs_method>[^\t]+)\t(?P<cs_host>[^\t]+)\t(?P<cs_uri_stem>[^\t]+)\t(?P<cs_status>[^\t]+)\t(?P<cs_referer>[^\t]+)\t(?P<cs_user_agent>[^\t]+)\t(?P<cs_uri_query>[^\t]+)\t(?P<cs_cookie>[^\t]+)\t(?P<x_edge_result_type>[^\t]+)\t(?P<x_edge_request_id>[^\t]+)\t(?P<x_host_header>[^\t]+)\t(?P<cs_protocol>[^\t]+)\t(?P<cs_byte>[^\t]+)\t(?P<time_taken>[^\t]+)\t(?P<x_forwarded_for>[^\t]+)\t(?P<ssl_protocol>[^\t]+)\t(?P<ssl_cipher>[^\t]+)\t(?P<x_edge_response_result_type>[^\t]+)\t(?P<cs_protocol_version>[^\t]+)\t(?P<fle_status>[^\t]+)\t(?P<fle_encrypted_fields>[^\t]+)\t(?P<c_port>[^\t]+)\t(?P<time_to_first_byte>[^\t]+)\t(?P<x_edge_detailed_result_type>[^\t]+?)\t(?P<cs_content_type>[^\t]+)\t(?P<sc_content_len>[^\t]+)\t(?P<sc_range_start>[^\t]+)\t(?P<sc_range_end>[^\t]+)')
(edited)
Parsing using the regex only works if the format is consistent. The point of w3c logs is that they can be customized.
IIS logs also follow this format, and we see variations using customer IP
in the third field and server IP
in the third field coming from logs from the same server.
The documentation for
parse_regex
suggests opening a ticket to request aparse_*
function be added for a log format that isn't already available.My request is for AWS Cloudfront Logs, the format is W3C Extended Log File format. Which is made up of tab-delimited log lines with 2 "comment" lines prepended with "#" at the top of the file, one with a version # and one with a list of the fields. https://www.w3.org/TR/WD-logfile.html
And is further defined on this doc.: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html
e.g.: