Open hzy46 opened 3 years ago
name
, plugin
, plugin_params
, type
, require
.template_variables
in plugin_params
. However, when users require
a prerequisite, he/she must specify all template_variables
. This is a simplification of the mechanism, which ensures that we will never require a prerequisite with unfulfilled template variables.extras
field to make the other parts of the job config cluster-agnostic.# in marketplace
- name: install_wget
plugin: cmd
plugin_params:
callbacks:
- event: taskStarts
commands:
- "apt update"
- "apt install -y wget"
# in marketplace
- name: mnist
require:
- name: marketplace://name/install_wget
plugin: cmd
plugin_params:
callbacks:
- event: taskStarts
commands:
- mkdir -p {{ dataPath }}
- wget http://1.2.3.4/mnist.zip -O {{ dataPath }}
- cd {{ dataPath }}
- unzip mnist.zip
template_variables:
- name: dataPath
# in job
prerequisites:
- type: dockerimage
uri: 'openpai/standard:python_3.6-pytorch_1.2.0-gpu'
name: docker_image_0
taskRoles:
taskrole:
instances: 1
completion:
minFailedInstances: 1
taskRetryCount: 0
prerequisites:
- mnist
dockerImage: docker_image_0
resourcePerInstance:
gpu: 1
cpu: 3
memoryMB: 29065
commands:
- sleep 0s
defaults:
virtualCluster: default
extras:
reference_prerequisites:
- name: mnist
require:
- name: marketplace://name/mnist
template_variables:
dataPath: /dataset/mnist
# set up a imagenet
# in marketplace
- name: confignfs_pvc
plugin: pvc_storage
plugin_params:
name: confignfs
mountPath: {{ mountPath }}
template_variables:
- name: mountPath
# in marketplace
- name: imagenet
require: # if the required prerequisite has template_variables, all the template_variables MUST be fulfilled.
- name: marketplace://name/confignfs_pvc
template_variables:
mountPath: /mnt/confignfs_pvc
plugin: cmd
plugin_params:
callbacks:
- event: taskStarts
commands:
- mkdir -p {{ dataPath }}
- cp -r /mnt/confignfs_pvc/imagenet/* {{ dataPath }}
template_variables:
- name: dataPath
# in marketplace
- name: imagenet_only_validation
require: # if the required prerequisite has template_variables, all the template_variables MUST be fulfilled.
- name: marketplace://name/confignfs_pvc
template_variables:
mountPath: /mnt/confignfs_pvc
plugin: cmd
plugin_params:
callbacks:
- event: taskStarts
commands:
- mkdir -p {{ dataPath }}
- cp -r /mnt/confignfs_pvc/imagenet/validation/* {{ dataPath }}
template_variables:
- name: dataPath
# in job
prerequisites:
- type: dockerimage
uri: 'openpai/standard:python_3.6-pytorch_1.2.0-gpu'
name: docker_image_0
taskRoles:
taskrole:
instances: 1
completion:
minFailedInstances: 1
taskRetryCount: 0
prerequisites:
- imagenet
dockerImage: docker_image_0
resourcePerInstance:
gpu: 1
cpu: 3
memoryMB: 29065
commands:
- sleep 0s
defaults:
virtualCluster: default
extras:
reference_prerequisites:
- name: imagenet
require:
- name: marketplace://name/imagenet
template_variables:
dataPath: /dataset/imagenet
# set up a debug hook
# in marketplace
- name: debug_hook
plugin: cmd
plugin_params:
callbacks:
- event: taskFails
commands:
- echo "will sleep for {{ min }} minutes for debugging..."
- sleep {{ min }}m
template_variables:
- name: min
# in job
prerequisites:
- type: dockerimage
uri: 'openpai/standard:python_3.6-pytorch_1.2.0-gpu'
name: docker_image_0
taskRoles:
taskrole:
instances: 1
completion:
minFailedInstances: 1
taskRetryCount: 0
prerequisites:
- debug_hook
dockerImage: docker_image_0
resourcePerInstance:
gpu: 1
cpu: 3
memoryMB: 29065
commands:
- sleep 0s
defaults:
virtualCluster: default
extras:
reference_prerequisites:
- name: debug_hook
require:
- name: marketplace://name/debug_hook
template_variables:
min: 30
Motivation
5145 has extended the prerequisite field. But users can only use and share prerequisites in job yaml. We can support UI for prerequistes, especially for data prerequisite. This issue will explain how the users create and use a data prerequisite in the cluster. With this feature, cluster users can easily share datasets with each other, and it may benefit future features e.g. dataset caching and optimization.
Explanation
How do users create a dataset in the cluster?
Dataset item that doesn't need a PVC storage
The user should create a dataset item in marketplace. dataset item has a prerequisite spec and other misc info (e.g. title, usage) in marketplace.
If the dataset is just downloaded from the Internet, it should have the following spec:
Dataset item that needs a PVC storage
If the dataset is already saved in a PVC, it should have the following spec:
Here we define a new field: requireStorages. It shares the same spec as the current implementation. If this prerequisite is included in a job, we should merge the storage field here with other PVC storage.
How do users use dataset in the cluster?
On marketplace pages
On marketplace pages, users can click
use
to create an empty job with the corresponding dataset.On job submission page
On job submission page, users can select his/her dataset by the field under taskrole section.
How to represent marketplace prerequisite in job yaml?
The dataset prerequisite from marketplace will be expressed as
marketplace://prerequisites/itemId/<item-id>
One example is as follows:
The webportal page should provide a link to marketplace for the user.
After submission, rest-server will parse these marketplace items and pass them to db controller and runtime. Rest-server should also take care of
requireStorages
, and merge it with other storage spec carefully.The following errors can happen in rest-server:
requireStorages
.Other features
We can enable urls like
http(s)://
in addition tomarketplace://
. It will bring a lot of convenience and easy to implement.Implementation
requireStorages