RAPTOR: tasks functions should or may not generate function sandbox unless it is specified

radical-cybertools / radical.pilot

RADICAL-Pilot

http://radical-cybertools.github.io/radical-pilot/index.html

Other

54 stars 23 forks source link

RAPTOR: tasks functions should or may not generate function sandbox unless it is specified #2978

Open AymenFJA opened 1 year ago

AymenFJA commented 1 year ago

Currently, RAPTOR functions are generating empty tasks sandbox on the agent side. As far as I understand these are lightweight functions and they should not generate a sandbox at all. One way to fix this confusion is that we can use the sandbox option in TaskDescription such as if specified by the user then it would be created. If not then nothing would be generated.

I imagine if we have 1M functions that would generate 1M empty directories which is an absolute I/O bottleneck but more importantly not required.

eirrgang commented 1 year ago

I don't think there is any assumption that Raptor functions are lightweight, is there?

A lot of tools have Python interfaces that we would like to access through Raptor, and a lot of these tools either inspect the working directory or have filesystem side effects. (e.g. lammps)

Making task sandboxes optional seems reasonable, but the option needs to be well-documented. I think the current behavior is consistent across rp.Task objects. Also, as I understand it, there is nothing preventing a rp.Task from using a pre-existing sandbox directory, right? Maybe a self-consistent resolution without code changes would be to suggest that Raptor tasks could be submitted with "sandbox": worker.task_sandbox or something to explicitly re-use the worker sandbox.

Also: please note that it is important that any default behavior for generating a task_sandbox should be easily discoverable at the site of task_manager.submit()

andre-merzky commented 1 year ago

This was discussed on the devel call, several options are on the table:

1) configure a raptor worker so that tasks either all get a sandbox, or don't. Configuration would be done via worker description.

Pro:
- configurable per use case
Con:
- no distinction for task types in same execution
- we don't know what worker a task will end up on

2) enable raptor task sandboxes only when a sandbox is explicitly set in the task description

Pro:
- recovers old default behavior which seems sane
Con:
- need to specify unique sandboxes for tasks whose UID is not yet known.

3) add a flag to the task description to suppress sandbox creation. that flag would default to True for function tasks

Pro:
- solves the problem in a relatively clean way
Con:
- task description is expanded once more.

eirrgang commented 1 year ago

There was previously a significant initiative to make raptor tasks more like traditional tasks. Maybe that was more of a code clean-up effort than a design goal, though. Is there a recognition that raptor tasks have notable differences from traditional tasks that warrant different default values for TaskDescription fields? Will there be a distinct type to represent the different behavior?

This was discussed on the devel call, several options are on the table

Maybe it came up on the call, but can you comment on whether it is feasible and reasonable to just re-use a raptor sandbox for the tasks (either manually or automatically)?

andre-merzky commented 1 year ago

Maybe it came up on the call, but can you comment on whether it is feasible and reasonable to just re-use a raptor sandbox for the tasks (either manually or automatically)?

You can always specify an existing directory as task sandbox. You can, for example, specify a raptor master or worker sandbox - in that sense you would enforce that sandbox to be reused for the tasks. That would then mimic the old behavior before we introduced sandboxes for raptor tasks.

There was previously a significant initiative to make raptor tasks more like traditional tasks. Maybe that was more of a code clean-up effort than a design goal, though.

It is a design goal, but we are not religious about it.

eirrgang commented 1 year ago

You can always specify an existing directory as task sandbox. You can, for example, specify a raptor master or worker sandbox - in that sense you would enforce that sandbox to be reused for the tasks. That would then mimic the old behavior before we introduced sandboxes for raptor tasks.

Would this be an option to resolve this issue?

andre-merzky commented 1 year ago

It resolves Ayman's use case all right, but the discussion now focuses on the default behavior.

eirrgang commented 1 year ago

It resolves Ayman's use case all right, but the discussion now focuses on the default behavior.

My preference would be to have consistent default behavior for objects of the same formal type.

I think it would be okay to use rp.Task in both cases if and only if the task_sandbox value could be populated before the object is returned by submit. I am highly skeptical of TaskDescription fields that have different interpretations or different default behaviors depending on other fields.

Subclasses or rigorous schema documentation are reasonable mitigating strategies if the unified rp.TaskDescription is retained.

In any case, please warn me when behavior changes hit devel (either in direct messaging or through an issue at https://github.com/SCALE-MS/scale-ms/issues)