Closed Jeffwan closed 4 years ago
Allow to use AWS secrets and volume in pipeline building.
The system already allows that. But adding some helper method can be useful.
Allow to use AWS secrets and volume in pipeline building.
The system already allows that. But adding some helper method can be useful.
Thanks! Change description and make it more clear.
@Jeffwan
@zabbasi is working on creating a generalized admission controller to inject secrets, environment variables, and volumes. That could be one way to simplify injecting common config for your users.
@jlewi That would be awesome! I will have a look and leave my comments.
We still also need to move the artifact store configuration to the backend. Admission controller cannot help us here, because the Argo workflow needs to be valid before submission.
What would be really useful for us would be better examples of how to build an actual pipeline on AWS. The only examples that currently exist make use of Sagemaker, but some simple examples would go a long way to getting AWS users up and running with KF Pipelines. Something simple (and independent of Sagemaker) like data processing done on images that are in S3 followed by a simple object detection model would be very useful. Similarly the tfx-oss example. Currently that entirely assumes GCP.
TensorFlow Extended is very interesting for us, but we would like to be able to use it on AWS!
@karlschriek Could you help come up a list? We can help build examples. tfx-oss examples could run on non-GPU platforms but just need NFS mounted (we are working on s3 support in apache beam python SDK, once it's done, we don't need NFS).
- KFP component SDK support.
Can you clarify this item? The SDK is platform-agnostic by design
We still also need to move the artifact store configuration to the backend.
The artifact repository config has moved to cluster configMap. The pipelines are portable for quite a long time.
I can help with a guide on how to setup the lightweight pipeline in AWS EKS since I have deployed and run several deployments on them.
As well as the small config/manifest changes here and there to work with S3 and AWS IAM roles.
@eterna2 I too am looking for some documentation regarding how to get started with KF pipelines on AWS. Is there any existing documentation you could direct me to?
@karlschriek Were you able to find any resources on these lines?
Thank you.
Ops. I forgot I wrote this.
I will do a PR to update the kustomize manifest with an overlay for AWS k8s with kube2iam + S3.
Will add a README there on the additional changes needed for S3.
@Dhaval08 The current manifest should work as is on any k8s cluster. The only limitations is that u need to use minio server instead of s3, and tensorboard will not work.
https://github.com/kubeflow/pipelines/blob/master/manifests/kustomize/README.md
I have installed kubeflow pipelines but I am unable to find a solid end-to-end example of using pipeline on AWS EKS that is independent of sagemaker. @karlschriek any luck on this case?
@Dhaval08 I have now managed to build something for our use-case. If I can find the time I'll try to strip out the use case specific parts and put it in an example.
@karlschriek Could you help come up a list? We can help build examples. tfx-oss examples could run on non-GPU platforms but just need NFS mounted (we are working on s3 support in apache beam python SDK, once it's done, we don't need NFS).
Hi @Jeffwan , sorry I somehow missed this comment. Do you mean coming up with a list of AWS-specific examples that would be useful?
@karlschriek I mean some examples you need but you can not find in the community. I will ask solution architectures to wrote more examples based on your needs.
@eterna2 Thanks for the contribution. I can help to address S3 and tensorboard issues. They are there for long times. Hope we can make it by 1.0. Another thing In your PR I notice is as we have more changes against S3 infrastructure. We probably need more testing infra support.
@Jeffwan
Above 2 use cases would be very helpful in my opinion. There are very limited resources in the community for the same.
Thank you.
@Dhaval08 Thanks for the feedbacks. I will ask people to work on some concrete e2e examples on this.
@Jeffwan
- An end-to-end example of using kubeflow pipelines (with AWS EKS) and demonstrating how to integrate with S3 and tensorboard.
- An end-to-end example of using kubeflow pipeline API. How to submit a run, monitor the status, get logs/artifacts, terminate a run, etc. using the pipeline API
Above 2 use cases would be very helpful in my opinion. There are very limited resources in the community for the same.
Thank you.
I would also find this useful. In particular also how to deal with artifacts (input data, saved models, generated predictions) etc. on S3. A really typical use-case we have is training image recognition models where the images are all stored individually in S3. Pipeline components that fetch these images and prepare them for training in TensorFlow, then saves the results to S3 again would be very useful.
It would also be important to show how InputBinaryFile and OutputBinaryFile etc. could be used here, although that isn't really AWS .
Pipeline components that fetch these images and prepare them for training in TensorFlow, then saves the results to S3 again would be very useful.
It would be nice to have more real-world components and pipelines like this. I can help you with any issues you have componentizing the code.
I would also find this useful. In particular also how to deal with artifacts (input data, saved models, generated predictions) etc.
I think that most of the intermediate data should be passed as artifacts (stored with generated URIs) and only some of the resulting data needs to be exported to the "external" data storage locations (verified models, etc).
It would also be important to show how InputBinaryFile and OutputBinaryFile etc. could be used here, although that isn't really AWS .
JFYI: InputBinaryFile
is just InputPath
+ open(some_path, 'rb') as some_file:
@Jeffwan @karlschriek Do we have aws s3 based e2e examples available yet?
@Jeffwan @karlschriek Do we have aws s3 based e2e examples available yet?
We do have examples using aws secret. Once you have permission, you can use s3 on your own. Do you have anything specific in mind? We can add more examples
Do we have aws s3 based e2e examples available yet?
What are the remaining items for AWS support?
@Ark-kun I think there's some items not done yet in issue which is related to s3. https://github.com/kubeflow/pipelines/issues/3405
We can close this one now.
@Jeffwan
- An end-to-end example of using kubeflow pipelines (with AWS EKS) and demonstrating how to integrate with S3 and tensorboard.
- An end-to-end example of using kubeflow pipeline API. How to submit a run, monitor the status, get logs/artifacts, terminate a run, etc. using the pipeline API
Above 2 use cases would be very helpful in my opinion. There are very limited resources in the community for the same.
Thank you.
Any update on this? Specifically looking for e2e examples for AWS EKS and S3 that do not involve sagemaker. Thank you.
@Dhaval08 check https://github.com/aws-samples/eks-kubeflow-workshop/blob/master/notebooks/05_Kubeflow_Pipeline/05_03_Pipeline_mnist.ipynb This doesn't involve SM
I get some requests from aws users asking for better support pipeline project on aws. l take a look at the project and notice there're some areas I can improve for aws users.
Beside aws specific components, I will also address kubeflow/pipelines#345 and kubeflow/pipelines#337.
I am starting work now and any comments and feedbacks are welcome!