Open Francis-Liu opened 9 years ago
I was running this code to launch a script on stampede.
Hi Francis,
sorry for the late reply on this: if this problem still persists, please rerun with SAGA_VERBOSE=DEBUG
and post the resulting output (or feel free to send by mail, its probably long...)
Thanks!
Just to confirm that I'm also intermittently seeing this exact same issue (running v0.40.1).
File "/.../my_class.py", line 89, in _create_service
service = Service(service_url, session=session)
File "/.../saga/job/service.py", line 115, in __init__
url, session, ttype=_ttype)
File "/.../saga/base.py", line 101, in __init__
self._init_task = self._adaptor.init_instance (adaptor_state, *args, **kwargs)
File "/.../saga/adaptors/cpi/decorators.py", line 57, in wrap_function
return sync_function (self, *args, **kwargs)
File "/.../saga/adaptors/shell/shell_job.py", line 510, in init_instance
self.initialize ()
File "/.../saga/adaptors/shell/shell_job.py", line 608, in initialize
raise saga.NoSuccess ("failed to run bootstrap: (%s)(%s)" % (ret, out))
NoSuccess: failed to run bootstrap: (127)(/bin/sh: /.../.saga/adaptors/shell_job/wrapper.sh: No such file or directory
) (/.../saga/adaptors/shell/shell_job.py +608 (initialize) : raise saga.NoSuccess ("failed to run bootstrap: (%s)(%s)" % (ret, out)))
It happens infrequently when creating an instance of saga.job.Service
, I can't find a way to reproduce it at present and unfortunately didn't have debug switched on when I got this error a short while ago but I'll update this if I can provide further info.
Hey - a debug log would be great indeed. I have not yet seen this issue popping up.
If it happens again, would you please also include the output of ls -la $HOME/.saga/adaptors/shell_job/
, please (before any other job service instance is created)? Is your code using one or more job service(s) concurrently?
Thanks, Andre.
Ah, I forgot to ask: when you say exact same issue
, does that also mean toward Stampede, or are you targeting a different resource? thanks!
When I said the "exact same issue" I meant the same error occurring at the same point in the session initialisation code - aside from the different line numbers, presumably due to other changes in the more recent release that I'm using, the stack track appears to be the same.
The specific error log posted above was from attempting to run a test job on localhost via SSH.
thanks, got it!
I think we occasionally still face this issue. I think the suggestion has been to remove $HOME/.saga right? @andre-merzky are there any ideas on how this can be addressed in the current version or v2?
Removing that dir gets you running again, indeed. This needs a proper investigation and fix though. I did not manage to find the underlying cause - its likely a time consuming effort to fix, but needs to be done eventually IMHO.
I took another look at this and have managed to recreate the problem (although not consistently) with DEBUG output enabled. I've provided @andre-merzky with some data to try and see if it's possible to identify what the cause of this might be.
Thanks @jcohen02 !
Thanks a lot for the provided log files and analysis, Jeremy - I'll dig through this, lets see if we can nail this one down after all that time :-)
This is the error:
This is my code: