hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.92k stars 1.95k forks source link

Error message logged with missing closing quotation mark #11150

Open gh-skatt opened 3 years ago

gh-skatt commented 3 years ago

Nomad version

1.1.3

Operating system and Environment details

CentOS 7

Issue

The "failed to validate self token" error message is logged without proper quotation marks around the error message "failed to lookup Vault periodic token: Error making API request.".

The a closing quotation mark is missing, which confuses log system field extraction systems.

Example:

Aug 11 11:50:29 hostname nomad: 2021-08-11T11:50:29.124+0200 [ERROR] nomad.vault: failed to validate self token/role: retry=30s error="failed to lookup Vault periodic token: Error making API request.

Reproduction steps

Expected Result

We expect there to be a double quotation mark at the end of the logged message, like this:

Aug 11 11:50:29 hostname nomad: 2021-08-11T11:50:29.124+0200 [ERROR] nomad.vault: failed to validate self token/role: retry=30s error="failed to lookup Vault periodic token: Error making API request."

This should give log aggregators a better chance at parsing this event correctly, and then extracting the error message.

Actual Result

Job file (if appropriate)

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

jrasell commented 3 years ago

Hi @gh-skatt and thanks for raising this.

Looking into the error path I found that it is generated from the Vault API and includes newlines which I believe is causing the issue here. Are there any additional lines below this message which could be associated to this top level log line?

In order to fix this I believe one solution would be to replace all the newline occurrences with another character, to ensure a nicer output experience.

gh-skatt commented 3 years ago

If there are any additional lines associated with this event they are not picked up by the splunk forwarder, but I find that unlikely.

Just before the event in question, I see the vault agent process has logged this:

Aug 11 11:50:29 hostname vault: 2021-08-11T11:50:29.122+0200 [INFO] cache.apiproxy: forwarding request: method=POST path=/v1/auth/token/lookup
Aug 11 11:50:29 hostname vault: 2021-08-11T11:50:29.122+0200 [INFO] cache: received request: method=POST path=/v1/auth/token/lookup
Aug 11 11:50:29 hostname vault: 2021-08-11T11:50:29.120+0200 [INFO] cache.apiproxy: forwarding request: method=GET path=/v1/auth/token/lookup-self
Aug 11 11:50:29 hostname vault: 2021-08-11T11:50:29.120+0200 [INFO] cache: received request: method=GET path=/v1/auth/token/lookup-self
Aug 11 11:50:28 hostname vault: 2021-08-11T11:50:28.571+0200 [INFO] cache.apiproxy: forwarding request: method=GET path=/v1/sys/seal-status
Aug 11 11:50:28 hostname vault: 2021-08-11T11:50:28.571+0200 [INFO] cache: received request: method=GET path=/v1/sys/seal-status
schmichael commented 3 years ago

I suspect the fix here will involve a new agent logging config parameter that allows forcing all log lines to have their internal newlines escaped. We'll need to check with Consul and Vault for prior art here on naming/behavior, but I suspect this is what's going on.