kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.6k stars 1.62k forks source link

Add AWS Support in Pipeline #1131

Closed Jeffwan closed 4 years ago

Jeffwan commented 5 years ago

I get some requests from aws users asking for better support pipeline project on aws. l take a look at the project and notice there're some areas I can improve for aws users.

Beside aws specific components, I will also address kubeflow/pipelines#345 and kubeflow/pipelines#337.

I am starting work now and any comments and feedbacks are welcome!

Ark-kun commented 5 years ago

Allow to use AWS secrets and volume in pipeline building.

The system already allows that. But adding some helper method can be useful.

Jeffwan commented 5 years ago

Allow to use AWS secrets and volume in pipeline building.

The system already allows that. But adding some helper method can be useful.

Thanks! Change description and make it more clear.

jlewi commented 5 years ago

@Jeffwan

@zabbasi is working on creating a generalized admission controller to inject secrets, environment variables, and volumes. That could be one way to simplify injecting common config for your users.

https://bit.ly/2UWHexS

Jeffwan commented 5 years ago

@jlewi That would be awesome! I will have a look and leave my comments.

Ark-kun commented 5 years ago

We still also need to move the artifact store configuration to the backend. Admission controller cannot help us here, because the Argo workflow needs to be valid before submission.

karlschriek commented 5 years ago

What would be really useful for us would be better examples of how to build an actual pipeline on AWS. The only examples that currently exist make use of Sagemaker, but some simple examples would go a long way to getting AWS users up and running with KF Pipelines. Something simple (and independent of Sagemaker) like data processing done on images that are in S3 followed by a simple object detection model would be very useful. Similarly the tfx-oss example. Currently that entirely assumes GCP.

TensorFlow Extended is very interesting for us, but we would like to be able to use it on AWS!

Jeffwan commented 5 years ago

@karlschriek Could you help come up a list? We can help build examples. tfx-oss examples could run on non-GPU platforms but just need NFS mounted (we are working on s3 support in apache beam python SDK, once it's done, we don't need NFS).

Ark-kun commented 5 years ago
  • KFP component SDK support.

Can you clarify this item? The SDK is platform-agnostic by design

We still also need to move the artifact store configuration to the backend.

The artifact repository config has moved to cluster configMap. The pipelines are portable for quite a long time.

eterna2 commented 5 years ago

I can help with a guide on how to setup the lightweight pipeline in AWS EKS since I have deployed and run several deployments on them.

As well as the small config/manifest changes here and there to work with S3 and AWS IAM roles.

Dhaval08 commented 4 years ago

@eterna2 I too am looking for some documentation regarding how to get started with KF pipelines on AWS. Is there any existing documentation you could direct me to?

@karlschriek Were you able to find any resources on these lines?

Thank you.

eterna2 commented 4 years ago

Ops. I forgot I wrote this.

I will do a PR to update the kustomize manifest with an overlay for AWS k8s with kube2iam + S3.

Will add a README there on the additional changes needed for S3.

@Dhaval08 The current manifest should work as is on any k8s cluster. The only limitations is that u need to use minio server instead of s3, and tensorboard will not work.

https://github.com/kubeflow/pipelines/blob/master/manifests/kustomize/README.md

Dhaval08 commented 4 years ago

I have installed kubeflow pipelines but I am unable to find a solid end-to-end example of using pipeline on AWS EKS that is independent of sagemaker. @karlschriek any luck on this case?

karlschriek commented 4 years ago

@Dhaval08 I have now managed to build something for our use-case. If I can find the time I'll try to strip out the use case specific parts and put it in an example.

karlschriek commented 4 years ago

@karlschriek Could you help come up a list? We can help build examples. tfx-oss examples could run on non-GPU platforms but just need NFS mounted (we are working on s3 support in apache beam python SDK, once it's done, we don't need NFS).

Hi @Jeffwan , sorry I somehow missed this comment. Do you mean coming up with a list of AWS-specific examples that would be useful?

Jeffwan commented 4 years ago

@karlschriek I mean some examples you need but you can not find in the community. I will ask solution architectures to wrote more examples based on your needs.

Jeffwan commented 4 years ago

@eterna2 Thanks for the contribution. I can help to address S3 and tensorboard issues. They are there for long times. Hope we can make it by 1.0. Another thing In your PR I notice is as we have more changes against S3 infrastructure. We probably need more testing infra support.

Dhaval08 commented 4 years ago

@Jeffwan

  1. An end-to-end example of using kubeflow pipelines (with AWS EKS) and demonstrating how to integrate with S3 and tensorboard.
  2. An end-to-end example of using kubeflow pipeline API. How to submit a run, monitor the status, get logs/artifacts, terminate a run, etc. using the pipeline API

Above 2 use cases would be very helpful in my opinion. There are very limited resources in the community for the same.

Thank you.

Jeffwan commented 4 years ago

@Dhaval08 Thanks for the feedbacks. I will ask people to work on some concrete e2e examples on this.

karlschriek commented 4 years ago

@Jeffwan

  1. An end-to-end example of using kubeflow pipelines (with AWS EKS) and demonstrating how to integrate with S3 and tensorboard.
  2. An end-to-end example of using kubeflow pipeline API. How to submit a run, monitor the status, get logs/artifacts, terminate a run, etc. using the pipeline API

Above 2 use cases would be very helpful in my opinion. There are very limited resources in the community for the same.

Thank you.

I would also find this useful. In particular also how to deal with artifacts (input data, saved models, generated predictions) etc. on S3. A really typical use-case we have is training image recognition models where the images are all stored individually in S3. Pipeline components that fetch these images and prepare them for training in TensorFlow, then saves the results to S3 again would be very useful.

It would also be important to show how InputBinaryFile and OutputBinaryFile etc. could be used here, although that isn't really AWS .

Ark-kun commented 4 years ago

Pipeline components that fetch these images and prepare them for training in TensorFlow, then saves the results to S3 again would be very useful.

It would be nice to have more real-world components and pipelines like this. I can help you with any issues you have componentizing the code.

I would also find this useful. In particular also how to deal with artifacts (input data, saved models, generated predictions) etc.

I think that most of the intermediate data should be passed as artifacts (stored with generated URIs) and only some of the resulting data needs to be exported to the "external" data storage locations (verified models, etc).

It would also be important to show how InputBinaryFile and OutputBinaryFile etc. could be used here, although that isn't really AWS .

JFYI: InputBinaryFile is just InputPath + open(some_path, 'rb') as some_file:

Shasvat0601 commented 4 years ago

@Jeffwan @karlschriek Do we have aws s3 based e2e examples available yet?

Jeffwan commented 4 years ago

@Jeffwan @karlschriek Do we have aws s3 based e2e examples available yet?

We do have examples using aws secret. Once you have permission, you can use s3 on your own. Do you have anything specific in mind? We can add more examples

Ark-kun commented 4 years ago

Do we have aws s3 based e2e examples available yet?

Check https://github.com/kubeflow/pipelines/blob/091316b8bf3790e14e2418843ff67a3072cfadc0/samples/contrib/aws-samples/titanic-survival-prediction/titanic-survival-prediction.py#L35

Ark-kun commented 4 years ago

What are the remaining items for AWS support?

Jeffwan commented 4 years ago

@Ark-kun I think there's some items not done yet in issue which is related to s3. https://github.com/kubeflow/pipelines/issues/3405

We can close this one now.

Dhaval08 commented 4 years ago

@Jeffwan

  1. An end-to-end example of using kubeflow pipelines (with AWS EKS) and demonstrating how to integrate with S3 and tensorboard.
  2. An end-to-end example of using kubeflow pipeline API. How to submit a run, monitor the status, get logs/artifacts, terminate a run, etc. using the pipeline API

Above 2 use cases would be very helpful in my opinion. There are very limited resources in the community for the same.

Thank you.

Any update on this? Specifically looking for e2e examples for AWS EKS and S3 that do not involve sagemaker. Thank you.

Jeffwan commented 4 years ago

@Dhaval08 check https://github.com/aws-samples/eks-kubeflow-workshop/blob/master/notebooks/05_Kubeflow_Pipeline/05_03_Pipeline_mnist.ipynb This doesn't involve SM