Closed jlewi closed 4 years ago
I'm happy to do the port. I think having a tool to do distributed data prep that isn't tied to GCP is useful. cc @texasmichelle
@holdenk sounds good.
FYI; The Flink Operator would also be useful https://github.com/lyft/flinkk8soperator.
In the near term, Flink is the most viable option for running Beam and thus TFX off GCP. (see also kubeflow/kubeflow#1583)
I agree Flink would be useful for supporting TFX off GCP but I think given the lack of reasonable Beam Flink Python support that for people building data pipelines in the next year or so that isn't a realistic option. But once Beam Flink Python works well I think making an opperator to allow submissions of Beam jobs to either Flink or Dataflow would be of great use in allowing portable TFX pipelines to work well on Kubeflow (left a comment on 1583 about this).
@holdenk any update on this?
Looks like there are YAML here: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/tree/master/manifest
I'm going to downgrade this to P2 and remove from 0.7 due to a combination of inactivity and lack of demand.
Would be great though if someone wants to pick this up.
/area kustomize /kind feature
@jtfogarty: The label(s) area/
cannot be applied, because the repository doesn't have them
This has been finished. /close
@holdenk: Closing this issue.
We have a ksonnet manifest for spark here: https://github.com/kubeflow/kubeflow/tree/master/kubeflow/spark
@holdenk @rawkintrevo what are your thoughts on whether we should port this to kustomize or not?
If a user wants to install and use the spark-operator with Kubeflow it seems perfectly reasonable to point them at the i[nstructions] (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) for how to install it on a K8s cluster.
I think we only need a kustomize manifest if we want to be able to install spark operator by default as part of one/or more opinionated deployments of Kubeflow.
For example, if/when TFX works on spark, if we have a whole bunch of applications that we want to install in order to create a Kubeflow+Spark+TFX deployment. Then we need to figure out a a better story for how to compose applications. But right now I think we are still using Spark as an opptional add on.
Thoughts?