Open thomasyu888 opened 4 years ago
Thanks for creating this ticket.
@thomasyu888 Before answering https://github.com/nlpsandbox/nlpsandbox-controller/issues/39 (submission ID, run ID), let's discuss about how users can submit a NLP tool for training/evaluation. We currently don't support training but this is another discussion that we should have soon in https://github.com/nlpsandbox/nlpsandbox-controller/issues/24.
Let's imagine that we want to enable the user to submit a tool for evaluation (with optional training) using one command of the client. What should be the information that the user must provide to the client? Also, it should be possible to use this command or a similar one to submit as part of a CI/CD workflow.
Submission during CI/CD workflow would probably look like this:
Submission request body:
Response:
@tschaffter, there are multiple things you touched on here which I will address.
docker push
out of the submit
command, but i think it's important for participants to know how to docker build
on their own. One of the biggest issues is that you don't get a digest until you push the docker image. Now that I have prefaced this, I will now talk about my solution using Synapse. There is already a command line function to submit the image. That being said, its really not as straightforward to use as it should, but currently I imagine the command can look something like this:
# This would be the entire workflow in the command line
## 1. Create synapse project, synapseclient will be installed due to nlpsandbox-cli dependency
synapse create Project --name "My Challenge Project"
# A synapse id will be returned from this, lets just say its syn12345
## 2. Log into docker.synapse.org
docker login docker.synapse.org
docker build -t docker.synapse.org/syn12345/my-model:latest .
docker push docker.synapse.org/syn12345/my-model:latest
# A synapse id and a shadigest will be created for this image. But I want the submit command to be:
## 3. Submit to evaluation queue
nlp-cli submit --docker_repo docker.synapse.org/syn111111/my-model:latest --projectId syn12345 --annotator_type {date, person, address} --teamId myteam_name
Step 3 will actually contain steps of its own:
All of this can be achieved in a CI/CD workflow
Now that I have gone through the whole workflow, lets try to simplify it a bit and just have participants build their docker image
docker build -t my-model:my-tag .
nlp-sandbox submit --docker-image my-model:my-tag --teamid team... --workspace_name
This submit function will essentially do:
Theres a lot I can say here, but I'll just touch on one key point - I think one of the main issues currently is that things are so tied to projects and projects have to be uniquely named. At the end of the day, it would be helpful for us to have a "copy" of the docker submission. So when people submit, I think its most efficient if they are "pushing" their docker image into a private registry that we already have access to but without having to have to create a workspace themselves. (They can specify a token)
We could include or exclude docker push out of the submit command, but i think it's important for participants to know how to docker build on their own. One of the biggest issues is that you don't get a digest until you push the docker image.
Pushing the docker image should be done by the user using Docker tools so that we keep the client as simple as possible, development and maintenance wise.
This approach also offers more flexibility to the user. Example: our example annotators already have a job in the GH CI/CD workflow to lint, test and publish the docker image to DockerHub. There are already GH Action that helps with pushing Docker images to registries. This workflow is standard. In order to submit the annotator for evaluation to the NLP Sandbox, the user should add one job to the CI/CD workflow that leverage the previous job (publication of the docker image).
The submission request body you specified looks fine, but if they are able to build and push an image, they won't need the API key to access the image (it'll have already been pushed)
The API key would be for us (the infrastructure) to pull a private image from DockerHub
Thanks @tschaffter, that is what I originally had in mind as well. I think participants should do the docker push
, but I wanted to keep the options open. One thing that we will really need to tackle is the ability to pull down private images and cache the submission. What we are actually missing are scoped access tokens per private docker repository, I don't think we should collect API keys from participants, as these API keys can actually do more than just "get one specific docker repository". It would be a security concern.
Furthermore, I still haven't fully added support for Dockerhub repositories, but what we can do is to have participants submit a yaml or json file.
docker_image: my-model
docker_digest: sha25....
This complicates things a bit, because participants must also share with a service user (dockerhub/synapse docker registry) so that the specific service user can access it, whereas right now when people submit a docker repository on Synapse, the admins to the queue automatically have access to it, AND the docker repo+digest is cached "forever". What we can do to circumvent these issues is to:
I am also in favor of having participants submit as a team, this would be the steps to getting started with teams:
I can invalidate all submissions that don't have a team id. I could even make a convenience function that creates or uses an existing team for users and auto registers that team to the challenge.
Get all the submissions objects submitted in the past. We could add an option to show only the submissions with a given status (e.g. Running, Completed), show only the last N submission (e.g. --head 5).