kubeflow / manifests

A repository for Kustomize manifests
Apache License 2.0
816 stars 880 forks source link

Port spark operator manifest to kustomize #122

Closed jlewi closed 4 years ago

jlewi commented 5 years ago

We have a ksonnet manifest for spark here: https://github.com/kubeflow/kubeflow/tree/master/kubeflow/spark

@holdenk @rawkintrevo what are your thoughts on whether we should port this to kustomize or not?

If a user wants to install and use the spark-operator with Kubeflow it seems perfectly reasonable to point them at the i[nstructions] (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) for how to install it on a K8s cluster.

I think we only need a kustomize manifest if we want to be able to install spark operator by default as part of one/or more opinionated deployments of Kubeflow.

For example, if/when TFX works on spark, if we have a whole bunch of applications that we want to install in order to create a Kubeflow+Spark+TFX deployment. Then we need to figure out a a better story for how to compose applications. But right now I think we are still using Spark as an opptional add on.

Thoughts?

holdenk commented 5 years ago

I'm happy to do the port. I think having a tool to do distributed data prep that isn't tied to GCP is useful. cc @texasmichelle

jlewi commented 5 years ago

@holdenk sounds good.

FYI; The Flink Operator would also be useful https://github.com/lyft/flinkk8soperator.

In the near term, Flink is the most viable option for running Beam and thus TFX off GCP. (see also kubeflow/kubeflow#1583)

holdenk commented 5 years ago

I agree Flink would be useful for supporting TFX off GCP but I think given the lack of reasonable Beam Flink Python support that for people building data pipelines in the next year or so that isn't a realistic option. But once Beam Flink Python works well I think making an opperator to allow submissions of Beam jobs to either Flink or Dataflow would be of great use in allowing portable TFX pipelines to work well on Kubeflow (left a comment on 1583 about this).

jlewi commented 5 years ago

@holdenk any update on this?

jlewi commented 5 years ago

Looks like there are YAML here: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/tree/master/manifest

I'm going to downgrade this to P2 and remove from 0.7 due to a combination of inactivity and lack of demand.

Would be great though if someone wants to pick this up.

jtfogarty commented 4 years ago

/area kustomize /kind feature

k8s-ci-robot commented 4 years ago

@jtfogarty: The label(s) area/ cannot be applied, because the repository doesn't have them

In response to [this](https://github.com/kubeflow/manifests/issues/122#issuecomment-572303460): >/area kustomize >/kind feature Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
holdenk commented 4 years ago

This has been finished. /close

k8s-ci-robot commented 4 years ago

@holdenk: Closing this issue.

In response to [this](https://github.com/kubeflow/manifests/issues/122#issuecomment-586781572): >This has been finished. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.