radanalyticsio / radanalyticsio.github.io

a developer oriented site for the radanalytics organization
https://radanalytics.io
10 stars 27 forks source link

Add support for installing with a custom Apache Spark distribution #271

Closed tmckayus closed 5 years ago

tmckayus commented 5 years ago

Is there a way to control the order of the how do I articles? It would be nice of the simple install should up before the "choose my own spark" install.

(edit) actualy it's aphabetic. So if we change the standard install to "install radanalytics.io with defaults?" for example it will sort

tmckayus commented 5 years ago

I have an idea for a refinement on this, where resources.yaml and resources-is.yaml both work with imagestreams and there is only one template ...

tmckayus commented 5 years ago

@elmiko @crobby @willb ptal (edit, willb with one l is a different willb :) )

The how_do_i instructions should be enough to show you how to use it, if not let me know! I may work up some CI tests for rad-image as well

crobby commented 5 years ago

Building the images works nicely for me But when I try to run a pyspark (2.7) app, it seem to try to use "openshift-spark:complete" as the image, but pulling it fails.
The openshift-spark:complete image seems to have been built, but it's 172.30.1.1:5000//openshift-spark:complete.....whereas the cluster attempt is trying to just use openshift-spark:complete. Could be a lookup issue (I'm on v3.11 oc cluster up)

crobby commented 5 years ago

Ok, it does seem like my issue is indeed the infamous local lookup bug with my oc cluster up instance. Tweaking the config map via: oc edit configmap default-oshinko-cluster-config and changing the sparkimage to reference the full image //: allows my jobs to run.

tmckayus commented 5 years ago

I have an idea to fix the openshift-spark references in the configmaps, pending

elmiko commented 5 years ago

just so i understand clearly, the updates to resources.yaml now point all the template to ImageStreams, which in turn are backed by the community images?

tmckayus commented 5 years ago

@elmiko yes the default imagestreams are pointed at the full community images.

When you do an image completion, you end up with a new tag in the imagestream referencing the image you just built. The "rad-image use" command just mods the template to call out a different tag, but they always reference the imagestream.

This gets us consistent behavior whether you're just using community images out of the box, or you want to roll your own, and you only have 1 template set.

elmiko commented 5 years ago

awesome, thanks for the explanation

tmckayus commented 5 years ago

@elmiko oshinko-webui will be okay as long as it reads default-oshinko-cluster-config to override the default configuration (which it may not). Alternatively, you scan specify the sparkimage on the cluster launch form so you could select a completed image stream using the pullspec from the project. (@crobby)

elmiko commented 5 years ago

ack, thanks for the explanation @tmckayus

i was just trying to think about corner cases where we might hit a not "easy button" type interaction. ideally this will not be a shock for webui users.

tmckayus commented 5 years ago

i was just trying to think about corner cases where we might hit a not "easy button" type interaction. ideally this will not be a shock for webui users.

I maybe should a add a quick note about the webui and a link to the webui howdoi

elmiko commented 5 years ago

I maybe should a add a quick note about the webui and a link to the webui howdoi

++ adding another bullet point to the "additional resources" section would definitely be in order

elmiko commented 5 years ago

i'm good with the current changes, i think the language for the prerequisities could be tightened up a bit but i am good to merge this and adjust later.

tmckayus commented 5 years ago

@elmiko, tightened up how? we can fix it now

tmckayus commented 5 years ago

Answer to an earlier question: the simplest way to verify that your local imagestreams are being used is to use the oc client to dump yaml and compare between buildconfigs, pods, and imagestreams. Using the console to do this may be impossible or too cumbersome. For example

oc get is radanalytics-pyspark -o yaml (under "status" this will list tags, the dockerImageReference field will tell you where the tagged image is from and list the sha256. The referenece should refer to the project. The sha256 is also given in the "image" field, incidentally)

oc get buildconfig sparkpi -o yaml (under "triggers", imageChange.lastTriggeredImageID will show a reference that should match the imagestream dumped above)

oc get pod mycluster-w-1-p9lnw -o yaml (under containerStatuses the imageID field will show a reference that should match a dump of the openshift-spark imagestream with "oc get is openshift-spark -o yaml")

A formal test can proabably be added based on this :)

elmiko commented 5 years ago

lGtm, thanks Trevor!