Open remcollier opened 9 months ago
I did some work trying to build networks of semantic web services to solve maths and engineering problems in the early-mid 2000s. Trying is the operative word because it was very difficult and not especially successful, so there might be some lessons about what not to do as well as perhaps some useful ideas.
The genesis was the OpenMath semantic description systems and the EU-funded Mathematics On the NET (MONET) project which led to a couple of UK-funded projects on mathematical service discovery and brokerage (KNOOGLE the idea being it was knowledge-based googling) and workflow construction. I'll drop some links to a few papers here in a separate comment.
Title: Manageable Actions in Semantically Defined ML Pipelines
Submitter(s):
Rem Collier
Motivation:
There is increasing interest in the area of ML Ops. This scenario proposes the use of Hypermedia Agents to manage the execution of ML Ops pipelines that are treated as either a black box (the agent oversees the execution of pipelines) or as a white box where the agent is the orchestrator of the pipeline. The integration of agents with ML Ops allows for the intelligent automation of the execution of pipelines as required. The use of agents as the orchestrators of pipelines allows the pipelines to be managed inteligently offering finer grained control over its execution.
Expected Participating Entities:
For the black box approach, the main participating entities would be the pipeline orchestrator (e.g. Apache Airflow, ML Flow, Argo, ...) and the management agent.
For the white box approach, the main entities would be a container engine (e.g. Docker, Singlarity), a repository of container images (e.g. Gitlab/Github) that implement a range of ML tasks that can be used in a pipeline (pre-processing steps, untrained models, trained models, ...), a semantic service providing RDF-based descriptions of the container images, a semantic service providing RDF descriptions of pipelines and a hypermedia agent design that is able to consume the pipeline descriptions and deploy/execute relevent container images based on that description.
Workflow:
For the black box approach, some other part of the system (possibly another agent but not required) decides that a given ML pipeline needs to be executed. For example, the decision could arised as a result of notification of unauthorised fishing. To identify potential culprits, a ship detection / tracking algorithm is run for a given area over a given time period. Due to the quantity of data execution of the pipeline can take anything from seconds to minutes or (rarely) longer. Another scenario is monitoring of model drift. In such a scenario, once the drift exceeds some given (or learnt) threshold, the agent could trigger a new model training pipeline using an updated dataset.
For the white box approach, the same scenarios apply, but the agent would have direct control over the creation and execution of the pipeline. This fine-grained control would require the agent to oversee Individual tasks, each of which could be durative in nature. An awareness of progress could be used to prepare the next task so that it is readly as soon as the current task is completed while miminising cloud resource usage.
Related Use Cases (if any):
Two related use cases are described above.
Existing solutions:
We are working on this scenario in the context of a remote sensing data analytics platform being developed as part of the CAMEO project
Identified Requirements by the TF:
Possible Gaps:
Comments: