w3c-cg / webagents

Autonomous Agents on the Web (WebAgents) Community Group
https://www.w3.org/community/webagents/
Other
23 stars 15 forks source link

[Manageable Affordances TF] Manageable Actions in ML Pipelines #24

Open remcollier opened 9 months ago

remcollier commented 9 months ago

Title: Manageable Actions in Semantically Defined ML Pipelines

Submitter(s):

Rem Collier

Motivation:

There is increasing interest in the area of ML Ops. This scenario proposes the use of Hypermedia Agents to manage the execution of ML Ops pipelines that are treated as either a black box (the agent oversees the execution of pipelines) or as a white box where the agent is the orchestrator of the pipeline. The integration of agents with ML Ops allows for the intelligent automation of the execution of pipelines as required. The use of agents as the orchestrators of pipelines allows the pipelines to be managed inteligently offering finer grained control over its execution.

Expected Participating Entities:

For the black box approach, the main participating entities would be the pipeline orchestrator (e.g. Apache Airflow, ML Flow, Argo, ...) and the management agent.

For the white box approach, the main entities would be a container engine (e.g. Docker, Singlarity), a repository of container images (e.g. Gitlab/Github) that implement a range of ML tasks that can be used in a pipeline (pre-processing steps, untrained models, trained models, ...), a semantic service providing RDF-based descriptions of the container images, a semantic service providing RDF descriptions of pipelines and a hypermedia agent design that is able to consume the pipeline descriptions and deploy/execute relevent container images based on that description.

Workflow:

For the black box approach, some other part of the system (possibly another agent but not required) decides that a given ML pipeline needs to be executed. For example, the decision could arised as a result of notification of unauthorised fishing. To identify potential culprits, a ship detection / tracking algorithm is run for a given area over a given time period. Due to the quantity of data execution of the pipeline can take anything from seconds to minutes or (rarely) longer. Another scenario is monitoring of model drift. In such a scenario, once the drift exceeds some given (or learnt) threshold, the agent could trigger a new model training pipeline using an updated dataset.

For the white box approach, the same scenarios apply, but the agent would have direct control over the creation and execution of the pipeline. This fine-grained control would require the agent to oversee Individual tasks, each of which could be durative in nature. An awareness of progress could be used to prepare the next task so that it is readly as soon as the current task is completed while miminising cloud resource usage.

Related Use Cases (if any):

Two related use cases are described above.

Existing solutions:

We are working on this scenario in the context of a remote sensing data analytics platform being developed as part of the CAMEO project

Identified Requirements by the TF:

  1. Target entity(ies) of the motivating scenario: an ML pipeline
  2. Life cycle:
    1. Pipeline is requested
    2. Waiting for resources to be allocated
    3. Pipeline is provisioned (resources are allocated).
    4. Running the pipeline
    5. (Optional) Pause the pipeline
    6. (If no failure) Retrieval of results
    7. Finished: Archived (storing for provenance and audit), deleted
  3. Information conveyed about affordances: Current Step, Completed Steps, Next Steps (could be more or less known in advance but dynamic decision making is possible. To be further detailed since that will give a reason for applying web agents), Results of Steps, Resource Consumption (live), Performance/Quality of Steps (live)
  4. How the life cycle is influenced: Tool-specific interfaces to switch states. In the beginning, the input data needs to be provided.
  5. Communication protocols: Bespoke REST APIs or wrapped into an SDK
  6. Representation formats: JSON in the case of REST API.
  7. Security and privacy considerations: Shared resources are used and IP-relevant data of companies are used. Each user gets an account and their space, thus nothing is shared between users. An agent needs to act on "their" own pipelines only.

Possible Gaps:

Comments:

bassetthound commented 9 months ago

I did some work trying to build networks of semantic web services to solve maths and engineering problems in the early-mid 2000s. Trying is the operative word because it was very difficult and not especially successful, so there might be some lessons about what not to do as well as perhaps some useful ideas.

The genesis was the OpenMath semantic description systems and the EU-funded Mathematics On the NET (MONET) project which led to a couple of UK-funded projects on mathematical service discovery and brokerage (KNOOGLE the idea being it was knowledge-based googling) and workflow construction. I'll drop some links to a few papers here in a separate comment.