Closed eirrgang closed 1 year ago
I think it was discussed elsewhere, but a corollary that @andre-merzky may be intending to tackle:
1.1 Explicitly setting the task_sandbox
property sets the working directory for a traditional RP Task, but not for a raptor task (which has separate code for the launch method, which involves one of several ways of spawning new processes from a running Worker task).
@eirrgang : can this be closed after the last iteration?
@eirrgang : can this be closed after the last iteration?
I believe so, but I haven't personally checked all cases. Can you confirm that the following have been addressed or won't be addressed?
task_sandbox
returns the effective working directory for tasks with scheduler
set, regardless of task mode.sandbox
can be set (in the TaskDescription) to determine the working directory for tasks with schedule
set. The value will be used as the working directory. The Task will fail if the path is not valid and accessible. (updated)task:///path
is equivalent to {task_sandbox}/path
task:///
URIs, where appropriatetask_sandbox can be set to determine the working directory for tasks with schedule set and will either set the working directory or produce an error no later than the attempt to submit the task.
That is not possible in our current approach: the task sandbox is interpreted on the remote resource, and thus it's validity can only be ascertained once the task reaches the agent (and, in this case specifically, the raptor worker).
task_sandbox can be set to determine the working directory for tasks with schedule set and will either set the working directory or produce an error no later than the attempt to submit the task.
That is not possible in our current approach: the task sandbox is interpreted on the remote resource, and thus it's validity can only be ascertained once the task reaches the agent (and, in this case specifically, the raptor worker).
Does a non-default value of task_sandbox
raise an error (when scheduler
is a non-empty string) that prevents acquisition of a broken Task object?
task_sandbox can be set to determine the working directory for tasks with schedule set and will either set the working directory or produce an error no later than the attempt to submit the task.
That is not possible in our current approach: the task sandbox is interpreted on the remote resource, and thus it's validity can only be ascertained once the task reaches the agent (and, in this case specifically, the raptor worker).
Does a non-default value of
task_sandbox
raise an error (whenscheduler
is a non-empty string) that prevents acquisition of a broken Task object?
Oh. Excuse me, maybe. I'm not asking for a new feature---just predictability with respect to standard use cases.
My impression was that setting task_sandbox was a normally supported use case. If not, then this point is irrelevant (assuming that task_sandbox is never inappropriately user-assignable). However, the current docs allows a sandbox field in the TaskDescription as long as it is relative to the pilot sandbox.
Does a non-default value of
task_sandbox
raise an error (whenscheduler
is a non-empty string) that prevents acquisition of a broken Task object?Oh. Excuse me, maybe. I'm not asking for a new feature---just predictability with respect to standard use cases.
My impression was that setting task_sandbox was a normally supported use case. If not, then this point is irrelevant (assuming that task_sandbox is never inappropriately user-assignable). However, the current docs allows a sandbox field in the TaskDescription as long as it is relative to the pilot sandbox.
We may talk a bit cross-purpose I think - or at least I may be missing the point. Let me try to do this stepwise.
The last statement reads like an ipso facto, so I am likely missing something?
Duh, I should read a bit more carefully. You also wrote:
However, the current docs allows a sandbox field in the TaskDescription as long as it is relative to the pilot sandbox.
That was corrected in the docs: the sandbox does not need to be relative to the pilot sandbox anymore, that restriction was lifted a while ago.
That is not possible in our current approach: the task sandbox is interpreted on the remote resource, and thus it's validity can only be ascertained once the task reaches the agent (and, in this case specifically, the raptor worker).
Which part is not possible? The error? If there is some part of the protocol in which a raptor task is less able to perform error checking than a traditional task, then we should document it. If assignment to sandbox for a raptor task (a task with "scheduler" set) behaves the same as for non-raptor tasks, then this item is fine. (I updated the checklist item text in an attempt to clarify)
Sorry, some quoting went wrong. Let me reply to the changed text:
The Task will fail if the path is not valid and accessible.
Yes, the task will move to FAILED
state in that case. That will not happen during submission though, but either during data staging or during execution. Checking path validity of the remote path during submission would be costly, and would race with the actual state of the file system anyway...
PS: This holds for all tasks, not only Raptor tasks.
I believe this is a known problem, but I don't see that it is explicitly tracked and is likely to cause confusion.
task_sandbox
property of a raptor task generated on a client has a constructed path that is never created or used in the execution environment.task:///
URI scheme does not work in staging directives raptor tasks.