rhettg / Tron

Next generation batch process scheduling and management
Other
11 stars 0 forks source link

Failure parsing command format characters leaves job in bad state #45

Closed rhettg closed 13 years ago

rhettg commented 13 years ago

A command like: command: "date +'%Y-%m-%d' | xargs -I DATE cp /nail/tmp/geocache.sqlite /nail/tmp/geocache.DATE.s

Causes a crash because Tron is trying to interpret %Y


2010-12-06 04:22:40,117 tron.action INFO Starting action run ad_simulation.89.backup_geocache
2010-12-06 04:22:40,117 tron.action INFO Opening file /nail/tron/ad_simulation/ad_simulation.89/backup_geocache.std
out for output
2010-12-06 04:22:40,120 tron.action INFO Action error: [Failure instance: Traceback: 
: unsupported format character 'Y' (0x59) at index 8
/usr/lib/python2.5/site-packages/twisted/internet/defer.py:371:_runCallbacks
/usr/lib/python2.5/site-packages/tron/node.py:278:_channel_complete
/usr/lib/python2.5/site-packages/twisted/internet/defer.py:280:callback
/usr/lib/python2.5/site-packages/twisted/internet/defer.py:354:_startRunCallbacks
---  ---
/usr/lib/python2.5/site-packages/twisted/internet/defer.py:371:_runCallbacks
/usr/lib/python2.5/site-packages/tron/action.py:195:_handle_callback
/usr/lib/python2.5/site-packages/tron/action.py:245:succeed
/usr/lib/python2.5/site-packages/tron/action.py:208:start_dependants
/usr/lib/python2.5/site-packages/tron/action.py:136:attempt_start
/usr/lib/python2.5/site-packages/tron/action.py:147:start
/usr/lib/python2.5/site-packages/tron/node.py:110:run
/usr/lib/python2.5/site-packages/tron/node.py:221:_open_channel/usr/lib/python2.5/site-packages/tron/action.py:271:command
]

This leaves the job 'RUNNING'

We should:

rhettg commented 13 years ago

Errors in generating the command can also cause a disastrous issue writing out state.

2010-12-06 15:42:59,538 tron.mcp ERROR failure writing state
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/tron/mcp.py", line 109, in store_state
    yaml.dump(self.data, data_file, default_flow_style=False, indent=4)
  File "/usr/lib/python2.5/site-packages/tron/mcp.py", line 137, in data
    data[j.name] = j.data
  File "/usr/lib/python2.5/site-packages/tron/job.py", line 347, in data
    return {'runs': [r.data for r in self.runs],
  File "/usr/lib/python2.5/site-packages/tron/job.py", line 82, in data
    return {'runs':[r.data for r in self.runs],
  File "/usr/lib/python2.5/site-packages/tron/action.py", line 266, in data
    'command': self.command
  File "/usr/lib/python2.5/site-packages/tron/action.py", line 271, in command
    return self.action.command % self.context
  File "/usr/lib/python2.5/site-packages/tron/command_context.py", line 25, in __getitem__
    return self.base[name]
  File "/usr/lib/python2.5/site-packages/tron/action.py", line 58, in __getitem__
    return "%.4d-%.2d-%.2d" % (run_date.year, run_date.month, run_date.day)
AttributeError: 'NoneType' object has no attribute 'year'

rhettg commented 13 years ago

Mostly resolved in 0.1.8

If there are bad format strings in the command, the action run will immediately fail and there are no other nasty impacts.

Some of these failures could conceivably be found at config time. We should think about doing that.