multiscale-cosim / EBRAINS-cosim

EBRAINS-cosim
Other
5 stars 0 forks source link

Application Companion #51

Closed mfahdaz closed 3 years ago

mfahdaz commented 3 years ago
Aspect Detail
Summary A application for managing and monitoring of to be deployed applications.
Task Area feature
Assignee
Information
Prerequisites
Dependencies

Summary

It manages and monitors the execution of applications on remote resources. It forward commands from the orchestrator and to the application process.

For managing the application, it can either calls the application directly or it receives the steering commands from orchestrator and manages the monitored application accordingly.

For monitoring, It has following two roles:

  1. It gathers information from the OS about the application’s run-time behavior and resource utilization, and also the application exit state. It will generate data points, in time, which will be offloaded into the time series database of ELK or Grafana stack allowing reuse of existing tooling.
  2. A second role for the application companion is as a relay point for information collected from application integrated monitoring. It provides direct insight into essential steps in the application. In the first design, we are implementing the following endpoints:
    • a simulation loop heartbeat,
    • dedicated Message Passing Interface (MPI) / HPC transport monitoring, and
    • monitoring of the steering commands

To summarize, it collects information from following three information streams in a non-invasive manner:

  1. Application logs from application log files
  2. Resource utilization and run-time behavior from OS
  3. Application insights from application's endpoints

Tasks

Requirements

Acceptance criteria

Diagrams and its description about the general Application Companion

mfahdaz commented 3 years ago

First version of the SRS and Design Document for Application Companion.

w-klijn commented 3 years ago

Food for thought. How does the feature list compare to kickstarter? https://pegasus.isi.edu/documentation/manpages/pegasus-kickstart.html

w-klijn commented 3 years ago

Protocol is dependency. other tasks are done. Close this isue as completed