radical-collaboration / hpc-workflows

NSF16514 EarthCube Project - Award Number:1639694
5 stars 0 forks source link

Adding a modified task to a stage failed #118

Closed Weiming-Hu closed 4 years ago

Weiming-Hu commented 4 years ago

I have observed an unexpected behavior (from a user's perspective) of EnTK. Here is a minimal case for reproducing this behavior.

from radical.entk import Task, Stage, AppManager, Pipeline

##############
# Use case 1 #
##############
s = Stage()
for i in range(10):
    t = Task()
    t.cpu_reqs = {
            'processes': 10,
            'threads_per_process': 1,
            'process_type': "MPI",
            'thread_type': "OpenMP"
            }

    t.name = "task-{}".format(i)
    t.arguments = ["file-{}".format(i)]
    s.add_tasks(t)

print("Adding independent tasks. Total: {}".format(len(s.tasks)))

##############
# Use case 2 #
##############
s = Stage()
t = Task()

# Because these settings are universal to all tasks. It would be nice
# to define them altogether.
#
t.cpu_reqs = {
        'processes': 10,
        'threads_per_process': 1,
        'process_type': "MPI",
        'thread_type': "OpenMP"
        }

for i in range(10):
    t.name = "task-{}".format(i)
    t.arguments = ["file-{}".format(i)]
    s.add_tasks(t)

print("Adding shared tasks. Total: {}".format(len(s.tasks)))

After running the above code, I have the following results:

(venv) geogadmins-Air:tmp wuh20$ python example.py 
Adding independent tasks. Total: 10
Adding shared tasks. Total: 1

When I'm trying to add a modified task to a stage, it won't go through. I have to create a new task using Task() each time. Is this expected?

I can see use case #2 to be helpful sometimes where I have a lot of tasks to create but they only vary in the input file names. It would be nice to define a task template and modify it on top of that.

Thank you

andre-merzky commented 4 years ago

It is in fact expected behavior. In the second case, you are creating one task instance. That one task instance is then changed 10 times - but no new tasks are created. Registering the same task 10 times is not the same as registering 10 different tasks (as in the first case).

Note that the Task constructor accepts a from_dict, so you can replace the code with something like

s = Stage()
task_dict = {
    't.cpu_reqs': {
        'processes'          : 10,
        'threads_per_process': 1,
        'process_type'       : "MPI",
        'thread_type'        : "OpenMP"
        }}

for i in range(10):
    task_dict['name']      = "task-{}".format(i)
    task_dict['arguments'] = ["file-{}".format(i)]
    s.add_tasks(Task(from_dict=task_dict))

print("Adding shared tasks. Total: {}".format(len(s.tasks)))

If that is something you prefer. But I don't really see anything wrong with the first version either :-)

HtH, Andre.

mturilli commented 4 years ago

@lee212 this also has to go into the documentation.

lee212 commented 4 years ago

@Weiming-Hu, task creation: https://radicalentk.readthedocs.io/en/latest/user_guide/adding_tasks.html