Open djenriquez opened 2 years ago
Hi @djenriquez ,
Thanks for using Nomad. I'm sorry to hear you are having an issue. I'm curious, is it the same job(s) that throw the error each time, or is there no pattern really? If it's the same job(s) could you post the jobspec?
Hey @DerekStrickland I work on the same team as DJ and we're working through this issue together. I noticed that it looks like this is on dispatched parameterized batch jobs specifically. Unfortunately we don't have a job spec to repro exactly since we generate these but I can try to strip out the IP from it and get something setup. The event that is returned with (seemingly utf-16 encoding issues) for context looks like this:
"Topic": "Job",
"Type": "AllocationUpdated",
"Key": "parameterized-batch-job-name/dispatch-1634850017-930d248c",
"FilterKeys": null,
"Index": 12201153,
"Payload": {
"Job": {
"Affinities": null,
"AllAtOnce": false,
"Constraints": null,
"ConsulNamespace": "",
"ConsulToken": "",
"CreateIndex": 12200900,
"Datacenters": ["dc1"],
"DispatchIdempotencyToken": "",
"Dispatched": true,
"ID": "parameterized-job-name/dispatch-1634850017-930d248c",
"JobModifyIndex": 12200900,
"Meta": {
"VAR1": "VAL1",
"VAR2": "",
"VAR3": "VAL3"
},
"ModifyIndex": 12201153,
"Multiregion": null,
"Name": "parameterized-batch-job-name/dispatch-1634850017-930d248c",
"Namespace": "default",
"NomadTokenID": "",
"ParameterizedJob": {
"MetaOptional": ["opt1", "opt2", "opt3"],
"MetaRequired": ["req1", "req2", "req3"],
"Payload": "forbidden"
},
"ParentID": "parameterized-batch-job-name",
"Payload": "AA==",
"Periodic": null,
"Priority": 50,
"Region": "us-east-1",
"Spreads": null,
"Stable": false,
"Status": "dead",
"StatusDescription": "",
"Stop": false,
"SubmitTime": 1634850017353593000,
"TaskGroups": [{
"Affinities": null,
"Constraints": [{
"LTarget": "${node.class}",
"Operand": "=",
"RTarget": "nomad-client-cluster"
}, {
"LTarget": "${attr.vault.version}",
"Operand": "semver",
"RTarget": "\u003e= 0.6.1"
}, {
It looks like there's some unicode encoding errors and I wonder if that's the source of the problem here where the Payload
is null (or the empty set is being encoded oddly)
That's exactly the path I am heading down. As best as I can tell, there is only one line that formats a message that way, and it has to do with parsing jobspecs as you might imagine. Also, the AA==
string really looks like an encoding issue. Glad we're on the same path.
Here the code in question:
func parseFile(path string) (*hcl.File, hcl.Diagnostics) {
body, err := ioutil.ReadFile(path)
if err != nil {
return nil, hcl.Diagnostics{
&hcl.Diagnostic{
Severity: hcl.DiagError,
Summary: "Failed to read file",
Detail: fmt.Sprintf("failed to read %q: %v", path, err),
},
}
}
return parseHCLOrJSON(body, path)
}
Interestingly, that error seems to be thrown by ioutil
, which it seems does not support utf-16. Assuming I'm right, and this is the only line that formats an error that way, it will take a code change to detect and handle utf-16. Are you confident your file is utf-16 encoded?
Ah ok that's seeming like it's very likely the issue then. To be honest this seems like some mismatch internal to nomad, I don't think it's file related specifically but I could be overlooking something. This error happens whenever the already submitted job to Nomad is being executed. It appears that it's a periodic job and that job definition in the Nomad UI has the areas above that look like a encoding issue appearing properly:
Nomad Job definition in UI:
{
"LTarget": "${attr.vault.version}",
"RTarget": ">= 0.6.1",
"Operand": "semver"
},
Nomad event with job definition from event stream:
{
"LTarget": "${attr.vault.version}",
"Operand": "semver",
"RTarget": "\u003e= 0.6.1"
}, {
The top level Payload
key in the job definition is filled out like so:
"Dispatched": false,
"DispatchIdempotencyToken": "",
"Payload": null,
"Meta": null,
"ConsulToken": "",
"ConsulNamespace": "",
"VaultToken": "",
"VaultNamespace": "",
"NomadTokenID": "",
"Status": "running",
"StatusDescription": "",
"Stable": false,
"Version": 4,
"SubmitTime": 1627671812651961900,
"CreateIndex": 1776621,
"ModifyIndex": 10040553,
"JobModifyIndex": 10040553
Where it seems like the Payload
is null
and I wonder if there's some odd encoding/decoding issue that causes this null
to turn into AA==
.
I'm not sure if that line above is being called during a new periodic job dispatch but I wouldn't be totally surprised since each periodic job dispatch creates a new job (so I could see that being implemented by pulling that job file from memory or file, parsing it, and resubmitting with updated fields).
Were your able to confirm that your file is in fact utf-16?
Same thing happening for me on Nomad 1.2.6
and SDK on v0.0.0-20220422170747-b1ce39297285
.
In my case, whenever a parameterized job with the following block or no payload included is submitted, the SDK returns the error mentioned in this issue.
parameterized {
payload = "forbidden"
}
* 'Job.Payload': source data must be an array or slice, got string
And when looking at the map structure returned, I see Payload:AA==
as well. AA==
is the value of a 0 byte encoded to base64, which I think in Go, translates to null
(?) when the SDK is expecting to see an empty slice
Thanks for the extra info @danlsgiga, that's super helpful.
Nomad version
Operating system and Environment details
Amazon Linux 2
Issue
Using the Go SDK, when processing a Job topic event, we often run into an issue where the SDK fails to marshal the Job event:
Looking at the payload, the event returns the following:
The full event otherwise has a good structure, the payload is just off.
Go mod dep set at
github.com/hashicorp/nomad/api v0.0.0-20210920221949-6d126ac53cbc
Reproduction steps
Use the SDK to subscribe to the event stream with a Job topic filter and process events returned from the stream.
Expected Result
A proper
Payload
returned instead of a string. If there is an issue with the event, an error should be returned with the event (api.Event.Err
).Actual Result
A string
Payload
is returned with no errors reported.