Closed zmerlynn closed 6 years ago
cc @aronchick
@zmerlynn nice write up! WRT the separate glusterfs example, I agree and I like the suggestion of just moving to a single example that uses a ReadWriteMany PV claim. We only created the glusterfs example so that the community would have an example of how to connect the spark example with the volume plugins, but using a ReadWriteMany PV claim achieves the same goals in a more widely applicable manner.
With respect to injecting configuration, we've taken the following approach for other images that need config that goes beyond something a user could provide via a few environment variables:
Use the source-to-image project (https://github.com/openshift/source-to-image) to author a "builder" image which takes configuration as source and produces a customized image. The user can then run that image. In addition at startup time, we copy the configuration from the image to a VOLUME path and configuration the framework to use that as the configuration location, thus allowing the configuration to be edited dynamically by the user, assuming the framework tolerates such things.
You can see an example of the changes involved in this PR which adds s2i functionality to our existing jenkins image: https://github.com/openshift/jenkins/pull/36
there's a lot going on in that PR beyond the s2i enablement, but the main thing you need to do to implement the s2i builder spec is to provide an assemble script (knows how to consume the "source" being provided by the user and put it where it belongs inside the image) and a run script (knows how to run the framework when the image is started).
Users can then build customized images with their config injected by running "s2i build file:///some/dir/with/config kubernetes/spark:latest myorg/mycustomsparkimage"
@bparees That project looks really neat. One of my biggest concerns about doing something like that, though, is that it's pretty heavy to build/push a new image. It may be naive, but I would really like to be able to ship composable units that can be configured so an end-user doesn't have to re-build the container.
The place I definitely see source-to-image
working is actually the package manager workflow, since it seems to simplify that workflow fairly well as well.
building new/custom images is definitely a trade-off. The cost is out of the box time/effort, the pay off is you end up with an immutable image you can move between environments without risking losing configuration details. At a minimum maybe it's something to consider enabling even if it's not the only way to allow config injection. (ie allow both a lightweight runtime way to inject config, and provide s2i enablement for users that want to have a reproducible way to construct custom images with config baked in)
@bparees: Agreed. And it loops back on the config parameterization, because you (obviously) need to override the entire set of image names in the resources for Spark if you end up re-baking the configs as you suggest. I think the next level of something like source-to-image
is one that's a little more k8s aware: if we had a concept of "package source" where the docker images were built with source-to-image
and the tagged image names stuffed into the .yaml
, I'd actually be a little less ambivalent and more positive on the side of just rebuild/push.
@bparees: (And to be fair, that could be done with a minor amount of, say, Jinja templating right now if someone wanted to do a one-off solution until we could settle on something better. So I might be letting perfect get in the way of good enough. I'll think about it s'more.)
if we had a concept of "package source" where the docker images were built with source-to-image and the tagged image names stuffed into the .yaml
There's definitely a local iteration flow that could be amped up. Build from a repo, push, test. Rapid rebuild of one of the images is the most painful part of a lot of these flows.
There's definitely a local iteration flow that could be amped up. Build from a repo, push, test. Rapid rebuild of one of the images is the most painful part of a lot of these flows.
Yeah, and I suspect in a lot of cases you want to build/test in a private project and publish in a real project. At least, larger organizations probably want this option available (google-containers
, I presume OpenShift).
There's also a sub-bullet here I didn't blow out yet: We need a separate repo for "official" images, because we probably need to start doing "central" build on them that actually hooks into the main build: e.g. we have a number of Makefiles that actually push to gcr.io/google-containers
, but pretty much no way to send a PR for the image alone, get that approved and have a central thing do source-to-image post-approval. So right now there's actually an awkward dance with any image changes for "official" apps where you put the PR up and either push prior to LGTM (thwarting :latest
, which is fine, it's considered harmful), or race the submit queue.
(We can also discuss how to get images directed to other places besides gcr.io/google-containers
, too.)
cc @kubernetes/sig-big-data
Hostname discussion: https://github.com/kubernetes/kubernetes/issues/260#issuecomment-133318903
cc @viglesiasce
what are the plans for this? Kafka, Spark and Elastic are on my radar, besides Cassandra
/cc @mattf @willb
@erictune @zmerlynn was this in plan/scope for v1.4? Should this be bumped out to v1.5?
imho we should close, as this should now be federated to the other repos. It no longer applies to master.
@timothysc good call push them to charts
I'd like to get the the point where spark-submit and spark-shell know how to talk to your active kubernetes cluster (not standalone mode), at which point, you don't need an examples or a Chart.
At any rate, it is not clear yet what repo or repos will host the kubernetes-on-spark code. This issue has useful backlog for whoever does that work. So, I am inclined to keep it open until someone can move these things to another backlog.
@foxish
Working on a detailed proposal for running spark natively. I'll be detailing our steps shortly in a new issue, but this rubric is helpful in evaluating what we need.
I also ran into an issue with Spark on Kubernetes with the executors having wrong IP Addresses which might be relevant here. https://github.com/kubernetes-incubator/application-images/issues/10
This is being upstreamed in Spark 2.3. The discussions can move to the Spark JIRA (https://issues.apache.org/jira/browse/SPARK-18278) following that.
I have #16320/#16498 out to bring Spark up to existing standards for our examples. This bug is both a friction log of things I ran into while working on that PR, and a set of anticipated issues with Spark.
I organized this into two sections, Spark-specific-ish and Kubernetes-general. Some of these bullets have open issues and I'm documenting them as part of the vertical slice involved.
Please feel free to correct me if I messed something up obvious about Spark or Kubernetes. I was a former Spark user in a bygone era, but it's moved quite quickly since then so any expertise has rusted.
Spark specific
examples/
, since it works over multiple versions, and isn't a teaching example.spark-defaults.conf
, we should probably just splat the existing configuration together, or compose them to allow the additional config to override KVs in the base image). See #6477, #4822.ReadWriteMany
and instead of assuming gluster, we rewrite the example as either gluster or NFS and use a PV claim. But that's not the only example of a resource that needs to be parameterized / templatized in some fashion.SPARK_LOCAL_DIRS
, so it's only useful for certain workloads, namely in-memory only workloads. This used to be the bread-and-butter of Spark because it was all it could do, but it proved rather limiting not to spill. However, using network or distributed storage for Spark intermediate storage is somewhat naive, but is the most convenient way on our system to configure a ReplicationController for storage. But, for the short term, given that the replication controller itself is<N>
wide, for anyone in GCE we could add a script to this directory to provision a set of<N>
volumes and use a claim to resolve it..persist()
and shuffle-spill, both of which are losable). The pod case is different than e.g. Netflix's Chaos Monkey with Spark tests, though, because the pod comes back with a different name.FILESYSTEM
mode (easiest) or (b) in multi-master backed by ZooKeeper (hardest) in order to withstand a restart with a running application. Similarly, the driver pod itself isn't protect at all right now, but before we protect that, we just want to look at the driver mode, how we want to support Spark app submission in the cluster, and the Zeppelin bullet below.Kubernetes general
spark-master
pod to aspark-master
single-pod-replication-controller, the Spark master objected to the slaves because the master started with a hostname that didn't match the service name the slaves were contacting it on. In this case, the slaves connected to the DNSspark-master
, andspark-master
saw messages forspark-master
and said "nope, that ain't me, I'mspark-master-a1b2d3
, you must have me confused for someone else". (c.f. #386)spark-master
, which resulted inSPARK_MASTER_PORT
, which is an environment variable that the Sparkstart-master.sh
script will happily pick up. Unfortunately, the script was expecting a single integer, nottcp://<ip>:7070
. It would be nice if there was a way to disable service env variable injection. See #1768 (which seems to cover that possibility, maybe)..yamls
, etc. Our best practices seem to suggest using image hardcoded tags (which make sense), and I probably would've scripted it further if I had to go much longer. The process itself before presenting a PR is interesting because prior to a PR you basically end up "hiding" the entire thing on a private project, then you end up displaying one "bump" to the world. Do we have any issues discussing how to iterate on "the package" (the set of.yaml
, Dockerfiles, etc.)? As we get anyhere near packaging, that's going to be one killer feature - the ability to rapidly iterate on actually developing package blobs for k8s (the resources and images both).cc @davidopp @timstclair @timothysc @wattsteve