radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

`task_sandbox` broken for raptor tasks #2802

Closed eirrgang closed 1 year ago

eirrgang commented 1 year ago

I believe this is a known problem, but I don't see that it is explicitly tracked and is likely to cause confusion.

  1. The task_sandbox property of a raptor task generated on a client has a constructed path that is never created or used in the execution environment.
  2. File staging directives with unqualified paths (relative paths instead of URIs) do not resolve to the directory in which raptor tasks actually run.
  3. The task:/// URI scheme does not work in staging directives raptor tasks.
eirrgang commented 1 year ago

I think it was discussed elsewhere, but a corollary that @andre-merzky may be intending to tackle:

1.1 Explicitly setting the task_sandbox property sets the working directory for a traditional RP Task, but not for a raptor task (which has separate code for the launch method, which involves one of several ways of spawning new processes from a running Worker task).

andre-merzky commented 1 year ago

@eirrgang : can this be closed after the last iteration?

eirrgang commented 1 year ago

@eirrgang : can this be closed after the last iteration?

I believe so, but I haven't personally checked all cases. Can you confirm that the following have been addressed or won't be addressed?

andre-merzky commented 1 year ago

task_sandbox can be set to determine the working directory for tasks with schedule set and will either set the working directory or produce an error no later than the attempt to submit the task.

That is not possible in our current approach: the task sandbox is interpreted on the remote resource, and thus it's validity can only be ascertained once the task reaches the agent (and, in this case specifically, the raptor worker).

eirrgang commented 1 year ago

task_sandbox can be set to determine the working directory for tasks with schedule set and will either set the working directory or produce an error no later than the attempt to submit the task.

That is not possible in our current approach: the task sandbox is interpreted on the remote resource, and thus it's validity can only be ascertained once the task reaches the agent (and, in this case specifically, the raptor worker).

Does a non-default value of task_sandbox raise an error (when scheduler is a non-empty string) that prevents acquisition of a broken Task object?

eirrgang commented 1 year ago

task_sandbox can be set to determine the working directory for tasks with schedule set and will either set the working directory or produce an error no later than the attempt to submit the task.

That is not possible in our current approach: the task sandbox is interpreted on the remote resource, and thus it's validity can only be ascertained once the task reaches the agent (and, in this case specifically, the raptor worker).

Does a non-default value of task_sandbox raise an error (when scheduler is a non-empty string) that prevents acquisition of a broken Task object?

Oh. Excuse me, maybe. I'm not asking for a new feature---just predictability with respect to standard use cases.

My impression was that setting task_sandbox was a normally supported use case. If not, then this point is irrelevant (assuming that task_sandbox is never inappropriately user-assignable). However, the current docs allows a sandbox field in the TaskDescription as long as it is relative to the pilot sandbox.

andre-merzky commented 1 year ago

Does a non-default value of task_sandbox raise an error (when scheduler is a non-empty string) that prevents acquisition of a broken Task object?

Oh. Excuse me, maybe. I'm not asking for a new feature---just predictability with respect to standard use cases.

My impression was that setting task_sandbox was a normally supported use case. If not, then this point is irrelevant (assuming that task_sandbox is never inappropriately user-assignable). However, the current docs allows a sandbox field in the TaskDescription as long as it is relative to the pilot sandbox.

We may talk a bit cross-purpose I think - or at least I may be missing the point. Let me try to do this stepwise.

The last statement reads like an ipso facto, so I am likely missing something?

andre-merzky commented 1 year ago

Duh, I should read a bit more carefully. You also wrote:

However, the current docs allows a sandbox field in the TaskDescription as long as it is relative to the pilot sandbox.

That was corrected in the docs: the sandbox does not need to be relative to the pilot sandbox anymore, that restriction was lifted a while ago.

eirrgang commented 1 year ago

That is not possible in our current approach: the task sandbox is interpreted on the remote resource, and thus it's validity can only be ascertained once the task reaches the agent (and, in this case specifically, the raptor worker).

Which part is not possible? The error? If there is some part of the protocol in which a raptor task is less able to perform error checking than a traditional task, then we should document it. If assignment to sandbox for a raptor task (a task with "scheduler" set) behaves the same as for non-raptor tasks, then this item is fine. (I updated the checklist item text in an attempt to clarify)

andre-merzky commented 1 year ago

Sorry, some quoting went wrong. Let me reply to the changed text:

The Task will fail if the path is not valid and accessible.

Yes, the task will move to FAILED state in that case. That will not happen during submission though, but either during data staging or during execution. Checking path validity of the remote path during submission would be costly, and would race with the actual state of the file system anyway...

PS: This holds for all tasks, not only Raptor tasks.