Open jakebiesinger-onduo opened 4 years ago
@jakebiesinger-onduo This is a cool initial proposal. We've certainly seen many cases where escaping requirements catch engineers out; the examples you've provided are, well, harrowing.
While the Terraform Team isn't likely to work on this in the near term, we'd certainly welcome a technical proposal to discuss working towards an eventual community PR.
That is how we make it more manageable:
jsonencode
function escapes strings for you.
I'm struggling with this while trying to configure an AWS Glue Data Table for Athena queries against ALB access logs (docs).
For reference, the regex I'm trying to recreate is pretty crazy: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\"
As far as I can tell, it needs to have the escaped chars (\"
and \s
). I've tried about 10 different combinations of inline strings, loading the regex string from a file, and using an <<EOT
string without luck. I've gotten close with the \"
characters working as expected but \s
was becoming \\s
.
My latest plan looked right:
~ "input.regex" = <<~EOT
- ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) "([^ ]*) ([^ ]*) (- |[^ ]*)" "([^"]*)" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) "([^"]*)" "([^"]*)" "([^"]*)" ([-.0-9]*) ([^ ]*) "([^"]*)" "([^"]*)" "([^ ]*)" "([^\s]+?)" "([^\s]+)" "([^ ]*)" "([^ ]*)"
+ ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\"
But when I looked at the CREATE TABLE
statement in lambda it had actually been applied as:
'input.regex'='([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" \\\"([^\\\"]*)\\\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \\\"([^\\\"]*)\\\" \\\"([^\\\"]*)\\\" \\\"([^\\\"]*)\\\" ([-.0-9]*) ([^ ]*) \\\"([^\\\"]*)\\\" \\\"([^\\\"]*)\\\" \\\"([^ ]*)\\\" \\\"([^\\s]+?)\\\" \\\"([^\\s]+)\\\" \\\"([^ ]*)\\\" \\\"([^ ]*)\\\"\n')
Here's the slice of HCL which generated the above:
parameters = {
"serialization.format" = 1
"input.regex" = <<EOT
([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\"
EOT
}
Not really sure where else to go, the raw string feature is looking really appealing right now.
I have managed to get this working I think, and it is the case the the \"([^\\s]+?)\"
should actually be \"([^s]+?)\"
not sure why the docs needed \s
- maybe it needed to be escaped for some reason when being entered through the console
In any case, the final regex I used was ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) "([^ ]*) ([^ ]*) (- |[^ ]*)" "([^"]*)" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) "([^"]*)" "([^"]*)" "([^"]*)" ([-.0-9]*) ([^ ]*) "([^"]*)" "([^"]*)" "([^ ]*)" "([^s]+?)" "([^s]+)" "([^ ]*)" "([^ ]*)"
being loaded in via file()
When rendering out the athena CREATE TABLE
query from the table generated by the above, it became:
WITH SERDEPROPERTIES (
'input.regex'='([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^s]+?)\" \"([^s]+)\" \"([^ ]*)\" \"([^ ]*)\"')
[^\s]
means anything except white space. [^s]
means anything except the s
character. They are not the same.
Up. This is a fundamental need. I would consider such a feature high priority as it greatly affects developer experience
+1
Working with REGEXP_EXTRACT
again and it's disgusting: "process_name" = "REGEXP_EXTRACT(textPayload, \"\\\\((.*?)\\\\)\")"
Using jsonencode
is a crutch, but not a nice solution.
I'd love to have this feature implemented.
Current Terraform Version
Use-cases
Some resources tend to be heavy on regular expressions. For example, GCP's Stackdriver Monitoring metrics includes the ability to pull out labels based on regex matches. Since HCL doesn't support any kind of "raw" string, these patterns can get pretty unwieldy.
For example, to match the string
finished with status code: 200
, we need the patternfinished with status code:\s(\d{3}))
. Expressing\d
in HCL requires us to escape the characters for HCL, and then escape the characters again for the resource:More complicated patterns can be even worse. For example, to match the string
[SUCCESS_RATIO] 32.5%
, we also have to escape the[
characters, as you'd expect in a regex. But since this is HCL, we have to escape the escapes as well, yielding a nearly-illegible entry:Attempted Solutions
For monitoring, our current workaround is to build all metrics in the UI and then import them into terraform. Not all resources have that option, and we're still left with code that's hard to decipher.
Proposal
It would be nice to indicate to terraform that a string should be treated as "raw", as you can in python and in many other languages. For example, in python, you can prefix a string with the
r
character to turn off all escaping within the string, meaning r"hello\tworld" will not expand\t
into a tab character.In HCL, it would be nice to have similar support. Then, my strings could become the more reasonable
That internal
"
presents a problem, but the python folks allow raw strings to still escape quotes. From their docs:References