Open colinbjohnson opened 6 months ago
Voting for Prioritization
Volunteering to Work on This Issue
I think there's an AWS bug in the mix here, too. Take this query from the ALB docs to create an ALB access logging Athena table:
CREATE EXTERNAL TABLE IF NOT EXISTS alb_access_logs (
type string,
time string,
elb string,
client_ip string,
client_port int,
target_ip string,
target_port int,
request_processing_time double,
target_processing_time double,
response_processing_time double,
elb_status_code int,
target_status_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string,
domain_name string,
chosen_cert_arn string,
matched_rule_priority string,
request_creation_time string,
actions_executed string,
redirect_url string,
lambda_error_reason string,
target_port_list string,
target_status_code_list string,
classification string,
classification_reason string,
conn_trace_id string
)
PARTITIONED BY
(
day STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' =
'([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\" ?([^ ]*)?( .*)?')
LOCATION 's3://DOC-EXAMPLE-BUCKET/AWSLogs/<ACCOUNT-NUMBER>/elasticloadbalancing/<REGION>/'
TBLPROPERTIES
(
"projection.enabled" = "true",
"projection.day.type" = "date",
"projection.day.range" = "2022/01/01,NOW",
"projection.day.format" = "yyyy/MM/dd",
"projection.day.interval" = "1",
"projection.day.interval.unit" = "DAYS",
"storage.location.template" = "s3://DOC-EXAMPLE-BUCKET/AWSLogs/<ACCOUNT-NUMBER>/elasticloadbalancing/<REGION>/${day}"
)
Note regex:
([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\" ?([^ ]*)?( .*)?
Specifically:
\"([^\s]+?)\" \"([^\s]+)\"
When I run this query in AWS Athena, I get a glue table with this regex:
([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^s]+?)\" \"([^s]+)\" \"([^ ]*)\" \"([^ ]*)\"
Note:
\"([^s]+?)\" \"([^s]+)\"
This shows up both in the AWS console and in a CLI aws athena get-table-metadata
, so it's not just a console rendering issue.
I've had a support case open with AWS about this, and the resolution is that I think the double-slash is actually correct. They linked me this re:Post article which says
Note: RegexSerDe follows the Java standard. Because the backslash is an escape character in the Java String class, you must use a double backslash to define a single backslash. For example, to define \w, you must use \w in your regex.
Humorous aside: note, "to define \w, you must use \w", which is what it literally says at the time I'm linking it. Looks like someone wrote to define \w, you must use \\w
in a raw doc when they meant \\\\w
, and the pipeline that rendered the raw doc into a web page interpreted the \\
as an escape sequence and rendered it as "\". š As I pointed out to our AWS TAM, this is a great example of why this stuff is confusing and it's important to keep the docs straight.
ETA: I had to edit this post to change rendered it as "\"
to rendered it as "\\"
, further proving my point. š¤¦
Terraform Core Version
1.8.4
AWS Provider Version
5.51.1_x5
Affected Resource(s)
Expected Behavior
An AWS Glue Catalog Table should be able to be rendered as follows:
Actual Behavior
There is no way to render a table to the above description.
Relevant Error/Panic Output Snippet
No response
Terraform Configuration Files
Steps to Reproduce
Run
terraform apply
and compare the output of:Debug Output
Panic Output
No response
Important Factoids
The behavior of rendering
\\s
as "one slash" when rendering alocal_file
resource ad rendering\\s
as "two slashes" when rendering aaws_glue_catalog_table
is odd - it seems that there should be some way to render aaws_glue_catalog_table
resource with one slash.One other interesting note is the volume of companies that have published incorrect
aws_glue_catalog_table
terraform configurations as a result of this - here are a few of the more broadly used ones:\\s
)^s
, which isn't correct)\\s
, which likely isn't correct)\\s
, which likely isn't correct)^s
, which isn't correct)References
No response
Would you like to implement a fix?
No