Closed BenFradet closed 7 years ago
@alexanderdean Ready for a review
Otherwise, it's ready to go given:
@alexanderdean I think I've taken care of everything
LGTM! Can we get an -rc1 of this into Bintray please?
rc1 is on bintray
Please:
-softLock
-lock
argumentrc2 published nvm last commit introduced a bug
rc3 should land in bintray soon :tm:
Hey @BenFradet - apologies for the delay in thinking of this requirement, but we need to come up with a dedicated return code (e.g. 3 or 4) for the scenario that the job cannot run because there is an existing lock in place. It might also make sense to make any informational message in this scenario a WARN rather than an ERROR.
Why? This is so that a job runner tool like Factotum can distinguish between a hard failure (e.g. EMR job failed) and a no-op situation (job could not start because existing job is running). That distinction is super-important because we cannot afford to raise a pager alert for every single no-op - to put it another way, if my EMR job takes 15 minutes and I schedule the job every 5 minutes, then I expect 8 no-ops an hour.
Sorry for the delayed requirement!
Should this be a new issue since the exit code is always 0 atm?
Hmm - I think this is covered by the lock ticket - as it's just a property of the lock handling?
don't you need exit code = 1 for a true failure?
Oh wow, sorry I just saw what you meant. So Dataflow Runner can never return non-zero?! :godmode:
not according to https://github.com/urfave/cli#exit-code
Ouch - new ticket then please, this is a bit of a showstopper!
my bad the logging library (!!) applies exit code https://github.com/sirupsen/logrus#level-logging
Hmm interesting, can we still do the exit codes in the other ticket though? I don't think we necessarily need an independent ticket though.
As your request has shown this isn't viable in the long term (delegating exit code to the logging library). As a result, I think this ticket is warranted because I'm currently moving away from the above.
Cool thanks
@alexanderdean published rc4 :+1:
More edgecases - timeWithFormat
should be able to take epoch
as a string. This is because all of the incoming variables provided by --vars
are stringly typed. Here's my error:
template: playbook.avro:37:53: executing "playbook.avro" at <.epochTime>: wrong type for value; expected int64; got string[0m
There could still be a usecase for supporting an int64
as well, but really we could live without it (because we can always put "" around a literal value).
Me again! I think I am seeing an exit code of 17 if there is an error creating the lock file (e.g. because the directory of the lock does not exist). A quick look at the code suggests that Dataflow Runner exits 17 if there is any problem with creating the lock, not just if the lock file is already there?
Here's what I currently have with rc5:
Thanks @BenFradet, I will give this one a spin. What does "Lock already hel" mean - why is this truncated?
It was a typo during the tests which is fixed in the published artifact, it should just say Lock already held
.
Cool thanks
timeWithFormat
template function (#18)base64
base64File
up.playbooks
with consul (depends on snowplow/ansible-playbooks#123)