radical-collaboration / QCArchive

2 stars 0 forks source link

Consider a pull/push task model from RP to QCA #7

Closed mturilli closed 5 years ago

andre-merzky commented 6 years ago

Would that mean that RP (or the REST service) pulls tasks from MolSSI for execution? If so (and I think this is easy), I would appreciate a pointer to the endpoint (documentation) where to pull the tasks from. Or does that mean to allow MolSSI to pull the REST API for task states? That is already possible with the current REST API.

dgasmith commented 6 years ago

We have not yet built our REST API unfortunately. If you have suggestions on the JSON format we would be quite interested.

mturilli commented 6 years ago

The former.

Suggestion for a coordination protocol to iterate/change:

  1. MolSSI pushes a resource request to RP endpoint;
    • RP submits a pilot matching the requested resources;
  2. RP pushes a resource availability notice to the MolSSI endpoint;
  3. MolSSI pushes task availability notice to RP endpoint;
  4. RP pulls tasks from MolSSI endpoint;
    • RP starts to execute tasks on the pilot;
  5. RP pushes events related to task execution to the MolSSI endpoint;
  6. Final states:
    • MolSSI pushes cancel of task execution
    • MolSSI pushes cancel of resource request
    • RP pushes done/fail of task execution
    • RP pushes done/fail of resource availability
andre-merzky commented 6 years ago

@dgasmith : We are not clever about the task description json right now: we basically just dump the RP task description as json. We probably should have a look at your internale task representation to see what parts are common / different.

andre-merzky commented 6 years ago

@mturilli : thanks, that makes sense IMHO.

dgasmith commented 6 years ago

Our primary task specification is very simple and looks like:

    "spec": {
        "function": "qcengine.compute_procedure", # Python function to call
        "args": [qcschema], # Python function args
        "kwargs": {"program": "psi4"}  # Python function kwargs
    },

For RP we will need to minimally specify:

Are there other minimum specifications?

andre-merzky commented 6 years ago

RP usually calls executables / command lines, not python functions - but it looks like qcengine --program psi4 qcschema_input is the representation as command line, right?

We don't do much scheduling wrt. memory requirements, yet. What would be the expected behavior when a node has free cores, but insufficient memory for another task? I assume we would leave the cores idle, right?

We don't need any other task information in the RP level, unless you want to run MPI codes. I assume though that the cores are used by application threads, is that right?

dgasmith commented 6 years ago

Yes, we will add a CLI to qcengine that we can call with the above structure.

Re memory: We would like to leave the cores idle if there is insufficient memory. The programs we run can automatically offload data to disk, but disk IO quickly becomes a bottleneck. Being memory-aware would be beneficial. We can look at overcoming this on our end if that is not possible.

Yes, our applications are well pipelined so multiple threads on a single core degrades performance in general.

andre-merzky commented 6 years ago

Memory aware scheduling is certainly possible - but is a piece we need to implement. Thanks for the other details.

mturilli commented 6 years ago

I would suggest to split this ticket into three separate tickets:

mturilli commented 6 years ago

Ready to code depending on the identified timeline.

mturilli commented 6 years ago

Start implementation on RADICAL REST interface following requirement specification Sec 4.

mturilli commented 6 years ago

Send around github pointer

mturilli commented 6 years ago

Waiting for the writing of the JSON schema by Rutgers

mturilli commented 5 years ago

There will be only a push model from QCA to RP. Tasks descriptions will happen in NGE.