mesos / chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
http://mesos.github.io/chronos/
Apache License 2.0
4.39k stars 529 forks source link

null field appended to command #355

Closed yoeduardoj closed 9 years ago

yoeduardoj commented 9 years ago

We submit a very simple job, and when it gets executed "null" gets appended to the command.

{
  "async": false,
  "command": "echo \"Hello Mini\"",
  "cpus": 1,
  "disk": 4096,
  "epsilon": "PT30M",
  "executor": "",
  "mem": 5000,
  "name": "MiniCurl",
  "owner": "a@gmail.com",
  "retries": 0,
  "runAsUser": "mesos",
  "schedule": "R/2014-11-07T18:40:42Z/PT24H"
}

The task ID created in Mesos already looks funny: ct:1422586155844:0:MiniCurl:null

The command succeeds in this case, but looking closely at how it was executed:

$ cat stdout Registered executor on ip-10-250-100-157.ec2.internal Starting task ct:1422582459084:0:MiniCurl:null Forked command at 10385 sh -c 'echo "Hello Mini" null' Hello Mini null Command exited with status 0 (pid: 10385) $ cat stderr I0130 01:47:40.187083 10363 exec.cpp:132] Version: 0.21.1 I0130 01:47:40.189077 10378 exec.cpp:206] Executor registered on slave 20150129-184612-1080359434-5050-1729-S0 $

Notice: sh -c 'echo "Hello Mini" null'

Where did the null come from? Tried the same command from the UI works with no null appended, and the task Id looks better

"ct:1422587418000:0:MiniCurl2: "

I looked at how the UI posts to the chronos API, and here's the payload:

{
  "schedule": "R/2015-01-30T03:10:18Z/PT1M",
  "epsilon": "PT30M",
  "name": "MiniCurl2",
  "command": "echo \"Hello Mini\"",
  "owner": "a@gmail.com",
  "async": false,
  "executor": "",
  "disabled": false,
  "softError": false
}
laktech commented 9 years ago

I see the same thing occurring when using chrono-sync.rb with a yaml configuration. Note also the "null" in the task name. I'm using commit c7740818fa7f260a2f3aa84baf124ffaa1c9a22f

Starting task ct:1422658619492:0:Hello:null
Forked command at 8043
sh -c 'echo Hello null'
Hello null
Command exited with status 0 (pid: 8043)
---
name: Hello
command: echo Hello
shell: true
owner: a@foo.com
cpus: 0.05
disk: 1024.0
mem: 720.0
softError: false
uris: []
environmentVariables: []
highPriority: false
runAsUser: root
schedule: R/2015-01-31T03:00:06.000+00:00/PT15M
scheduleTimeZone: ''
laktech commented 9 years ago

This bug is not present in c6daf150647ef3fe10dfaacdcc2a3e3b540635f9

elingg commented 9 years ago

Thanks all for finding and reporting this issue! One guess is that the issue was caused by this PR which appends arguments to the command, https://github.com/mesos/chronos/pull/337/. We can revert this feature if needed. If someone from the community has a chance, could you test it out?

laktech commented 9 years ago

@elingg I tried a revert but there were conflicts. Resolving the conflicts did not appear trivial. My guess is there are commits that depend on the method signatures introduced by #337.

337 not only breaks chronos but also mucks with the semantics of arguments and deserves proper attention.

It's also concerning that there aren't existing tests for this basic use case and I would argue a fix for this should also improve coverage in this area.

elingg commented 9 years ago

Thanks for the follow up. If we aren't able to patch and add test coverage for this soon, we will revert that PR.

elingg commented 9 years ago

Since this issue appears to be resolved, I will close it.

stshe commented 9 years ago

Hi, I'm looking at the fix in commit 1a7868e and it appears that the fix was to ignore all arguments passed to the PUT /scheduler/jobs/ endpoint? See https://github.com/mesos/chronos/commit/1a7868e1b7cdc87d4cae29d15e572b66b7ae20c3#diff-9b35d8326016d147f21c34e253e8e890L136

Is this correct? it seems there's no longer an ability to pass arguments to a force run.

elingg commented 9 years ago

I see, I'm not sure why it was removed from the endpoint as this shouldn't be needed for the fix. @brndnmtthws, is there some explanation?

brndnmtthws commented 9 years ago

No, I don't believe that commit changes any behaviour. It pulls the arguments from the job definition, rather than the API.

stshe commented 9 years ago

@brndnmtthws The behaviour prior to your commit added the ability to pass 'args' when triggering a force run. These args are different from the args field in the job description and are appended to the command. There's QueryParam for arguments that's now unused after this commit.

elingg commented 9 years ago

@stevencanopy, I see. We should remove this unused variable to clean this up.