radical-collaboration / QCArchive

2 stars 0 forks source link

Task descriptions #8

Closed mturilli closed 6 years ago

mturilli commented 6 years ago

RADICAL to provide the specification of task description used in RP

andre-merzky commented 6 years ago

The compute unit description in RP is really a dict, whose keys are defined here. When exchanging descriptions over the network, we dump the dict to json and send that json encoding around (sometimes compressed). For some communication channels, we collect multiple descriptions into a list, as to minimize the number of messages exchanged.

Let me know if this is sufficient - if not, I can add a more formal definition of the json structure.

dgasmith commented 6 years ago

From that description I assume the JSON would look similar to:

{
  `cpu_processes`: 4,
  `executable`: `QCEngine`,
  `name`: `task-id-5b8707587b8787679d2fd9ce`,
  ...
}

We can certainly provide the above. Can you give an example of the input/output_staging fields that would match our use case (a JSON blob input distributed by a RADICAL client to a work node and then the output pulled back from a RADICAL client)?

We can either drop the JSON blob output to stdout or dump it to a file. Whichever is easier for you.

andre-merzky commented 6 years ago

right. Below is a dump of an example run:

{
    'kernel'          : '',
    'name'            : '',
    'tag'             : None,
    'executable'      : '/bin/echo',
    'arguments'       : ['-c', 'input.dat', '126'],

    'pre_exec'        : [],
    'post_exec'       : [],
    'environment'     : {},

    'cpu_processes'   : 1,
    'cpu_process_type': 'POSIX',
    'cpu_threads'     : 1,
    'cpu_thread_type' : 'POSIX',

    'gpu_processes'   : 0,
    'gpu_process_type': '',
    'gpu_threads'     : 0,
    'gpu_thread_type' : '',

    'lfs_per_process' : 0,
    'stdout'          : '',
    'stderr'          : '',
    'input_staging'   : [{'source': 'pilot:///input.dat',
                          'target': 'unit:///input.dat',
                          'flags' : 64,
                          'action': 'Link'
                          }],
    'output_staging'  : [{'source': 'unit:///STDOUT',
                          'target': 'pilot:///STDOUT.000126',
                          'flags' : 64,
                          'action': 'Copy'
                         }],

    'restartable'     : False,
    'cleanup'         : False
}

The structure of the staging is clear I guess: for each file, you specify src, tgt, and an action, which can be Copy, 'TransferorLink. The flags (we use symbolic defines in the code) define behavior such asoverwriteorrecursive`.

Note that the URLs can use special schemas, which then refer to locations which are determined at runtime. Those are documented here

dgasmith commented 6 years ago

Gotcha, makes sense. So we would likely do something like:

    'input_staging'   : [{'source': 'pilot:///input-5b8707587b8787679d2fd9ce',
                          'target': 'unit:///input.dat',
                          'flags' : 64,
                          'action': 'Link'
                          }],
    'output_staging'  : [{'source': 'unit:///output.dat', # If we write to output.dat
                          'target': 'pilot:///output-5b8707587b8787679d2fd9ce',
                          'flags' : 64,
                          'action': 'Copy'
                         }],

and push each task specification to pilot in a input-uid format and look for ouptut-uid to parse.

vivek-bala commented 6 years ago

The 'pilot' schema points to a sandbox on the remote resource specific to a pilot job. In your example above, the task stages 'input-5b*' from this sandbox to a task, and then output.dat from a task to that sandbox.

The input staging directive used above is probably okay since the pilot can be used to stage all data from client to this sandbox. Based on your description, I think your output staging would be different though, since you require output data back on the client machine. Something like:

'output_staging'  : [{'source': 'unit:///output.dat', # If we write to output.dat
                      'target': 'client:///output-5b8707587b8787679d2fd9ce',
                      'flags' : 64,
                      'action': 'Transfer'
                     }],
mturilli commented 6 years ago
andre-merzky commented 6 years ago

RP does not allow metadata right now - EnTK encodes some information in the namefield`. We could add something like that - but would be cautious to blow up the task description too much, memory- and space-wise...

andre-merzky commented 6 years ago

This is now implemented in the RP branch feature/task_metadata:

cud = rp.ComputeUnitDescription()
cud.metadata = {'a' : [1, 2, 3]}
dgasmith commented 6 years ago

The above metadata field would be perfect, thank you.

andre-merzky commented 6 years ago

We are going to release the metadata feature with RP over the next couple of days.

andre-merzky commented 6 years ago

v0.50.17 of RP has been released and contains that feature.