Closed tafaust closed 2 years ago
Thanks for submitting your first pull request! You are awesome! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
@kevin-bates Can you guide me with the failing pipelines? To me, the failing pipelines seem to be PR unrelated. Looking at the changed files, I cannot see what might cause this error.
The ConfigMap
mounts are actually working on my side. I printed the contents of the /etc/spark/conf/spark-defaults.conf
and it matched the contents of the ConfigMap
.
Successfully tested with:
Jupyter lab (client)
KERNEL_NAMESPACE='sparkapps' KERNEL_SPARKAPP_CONFIG_MAP='spark-operator-conf' KERNEL_SERVICE_ACCOUNT_NAME='spark' jupyter lab \
--gateway-url=http://enterprise-gateway.enterprise-gateway:8888 \
--GatewayClient.http_user=guest \
--GatewayClient.http_pwd=guest-password
ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: spark-operator-conf
namespace: sparkapps # Important: needs to be the same namespace as the SparkApplication CRD
data:
spark-defaults.conf: |
spark.driver.extraJavaOptions: "-Divy.cache.dir=/tmp -Divy.home=/tmp"
spark.jars.packages: "org.apache.hadoop:hadoop-aws:3.2.3,org.apache.spark:spark-hadoop-cloud_2.13:3.2.1,com.amazonaws:aws-java-sdk-bundle:1.12.207"
spark.hadoop.fs.s3a.endpoint: "http://minio:9000"
spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
spark.hadoop.fs.s3a.access.key: "accesskey"
spark.hadoop.fs.s3a.secret.key: "secretkey"
spark.hadoop.fs.s3a.path.style.access: "true"
spark.hadoop.fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
spark.hadoop.fs.s3a.bucket.all.committer.magic.enabled: "true"
With a slightly modified version of the https://github.com/GoogleCloudPlatform/spark-on-k8s-operator (namespace, see KERNEL_NAMESPACE
) and jupyter-enterprise-gateway with the changes of this PR. The ServiceAccount
for a SparkApp
CRD can be changed here and the name needs to be set in KERNEL_SERVICE_ACCOUNT_NAME
.
Regarding the pipelines... pip install ".[test]"
works for me locally (Python 3.9, x64 linux) and thus cannot reproduce the error on my side.
$ python --version
Python 3.9.12
$ pip3 --version
pip 22.1 from /opt/miniconda3/envs/spark-k8s/lib/python3.9/site-packages/pip (python 3.9)
The pipeline error might possibly be related to: https://github.com/pypa/setuptools/issues/2204
The pipeline error might possibly be related to: https://github.com/pypa/setuptools/issues/2204
@tahesse - thanks for the update! Yeah, I believe we'll be transitioning to hatch
(from flit
) to address the pip
issue.
I can add the whole spec for the custom SparkApplication CRD if individual kernel parameterization is in place.
Let me know if I can support you with individual parameterization.
Some issue I encountered: https://github.com/jupyter-server/enterprise_gateway/pull/559 does not work with this PR in my local environment.
When I run
KERNEL_NAMESPACE='spark-apps' KERNEL_SERVICE_ACCOUNT_NAME='spark' jupyter lab \
--gateway-url=http://enterprise-gateway.fml-jupyter:8888 \
--GatewayClient.http_user=guest \
--GatewayClient.http_pwd=guest-password \
--GatewayClient.request_timeout=120.0
I receive the following in the kernel pod description:
MountVolume.SetUp failed for volume "spark-conf-volume-driver" : configmap "spark-drv-e4ef5180d7bbc664-conf-map-x-spark-apps-x-clustername" not found
Container image "elyra/kernel-spark-py:dev" already present on machine
Created container spark-kubernetes-driver
Started container spark-kubernetes-driver
and the kernel starts successfully.
When I start my local jupyterlab like this
KERNEL_SPARKAPP_CONFIG_MAP='jupyter-spark-operator' KERNEL_NAMESPACE='spark-apps' KERNEL_SERVICE_ACCOUNT_NAME='spark' jupyter lab \
--gateway-url=http://enterprise-gateway.fml-jupyter:8888 \
--GatewayClient.http_user=guest \
--GatewayClient.http_pwd=guest-password \
--GatewayClient.request_timeout=120.0
I end up with this pod description of the kernel pod:
MountVolume.SetUp failed for volume "spark-conf-volume-driver" : configmap "spark-drv-eedc1880d7c5906f-conf-map-x-spark-apps-x-clustername" not found
MountVolume.SetUp failed for volume "spark-configmap-volume" : configmap "jupyter-spark-operator-x-spark-apps-x-clustername" not found
The second line is new in this scenario and it fails to mount the ConfigMap
in the driver. I am not sure if anyone else encountered this issue. I will dig deeper into this issue.
How to reproduce:
make enterprise-gateway
The SparkOperator does not show any errors in the logs. Might be related to https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/714 but in my case the configmap volume mount times out and the kernel pod thus does not start. Digging deeper into this.
@kevin-bates Please don't merge until I resolved the issue. Thank you. 😃
@tahesse - would you mind merging with the current main
branch so we can get CI passing? Thank you.
if individual kernel parameterization is in place.
Could you please clarify what you mean by this? I'm assuming it's related to https://github.com/jupyter-server/enterprise_gateway/issues/523#issuecomment-1126256677 but want to make sure.
The full parameterization topic affects the primary Jupyter ecosystem and is not relegated to Enterprise Gateway so has a much larger scope.
@tahesse - would you mind merging with the current main branch so we can get CI passing?
Rebased.
if individual kernel parameterization is in place.
Could you please clarify what you mean by this? I'm assuming it's related to https://github.com/jupyter-server/enterprise_gateway/issues/523#issuecomment-1126256677 but want to make sure.
Yes, exactly. I was referring to your last paragraph. With the individual kernel parameterization in place, I'd like to add the whole SparkApplication spec for the operator.
The full parameterization topic affects the primary Jupyter ecosystem and is not relegated to Enterprise Gateway so has a much larger scope.
Thank you for the clarification!
I'd like to add the whole SparkApplication spec for the operator.
That would be great - thank you. Please know that I'm not familiar with this aspect of things but others have used this and I'm hoping we can get their feedback.
Please know that I'm not familiar with this aspect of things but others have used this and I'm hoping we can get their feedback.
I'm using it. That's also my motivation driver. Jupyter EG is a blessing and I'm happy to contribute. 😊
So, I think I "fixed" the issue from https://github.com/jupyter-server/enterprise_gateway/pull/1087#issuecomment-1130167969. Somehow, the ConfigMap
was created in the default namespace and thus could not be found by the default ServiceAccount
of the sparkapps
namesapce. Doh.
Other than that, I will link https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac for completeness.
@kevin-bates I think that this tiny PR can be merged safely.
So, I think I "fixed" the issue from https://github.com/jupyter-server/enterprise_gateway/pull/1087#issuecomment-1130167969. Somehow, the ConfigMap was created in the default namespace and thus could not be found by the default ServiceAccount of the sparkapps namesapce. Doh.
Excellent!
Other than that, I will link https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac for completeness.
:+1:
@kevin-bates I think that this tiny PR can be merged safely.
Will do following your next commits.
Just found that sparkConfigMap
still has some issues. For reference: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/216
Repositioned the docs entries as requested. For the sake of completeness, sparkConfigMap
does only work if the fix mentioned in the link from my previous comment (https://github.com/jupyter-server/enterprise_gateway/pull/1087#issuecomment-1131859976) is applied. I will try to propose a PR for this issue with spark k8s operator.
Hi @tahesse - thanks for the update. I'm still a little confused regarding the referenced issue and under what circumstances a user needs to do anything extra. The issue is very long and spans nearly 4 years.
I will try to propose a PR for this issue with spark k8s operator.
Will my confusion be clarified by the pending PR and, if so, does it make sense to hold this PR until it fully works. I apologize, but, at this point, I'm not sure if the changes in this PR work. Could you please clarify the current state of this, what works, what doesn't, etc.? (Thank you for your patience.)
Hi @kevin-bates. Sorry, I didn't put up a lot of information in the first place.
Specification of KERNEL_SPARKAPP_CONFIG_MAP supplies the sparkConfigMap
spec of the {Scheduled,}SparkApplication
CRD in Kubernetes.
The issue of the spark-on-k8s-operator repository mentions that (1) the SPARK_CONF_DIR
is set to /etc/spark/conf/
and (2) the configMap
from sparkConfigMap
is mounted (mountPath) to SPARK_CONF_DIR
.
Here is where the "fun" begins with this roughly 4yr old issue. 😅
There exists different solutions to the issue that spark doesn't make use of the mounted configMap
. There exists images, PRs in non-upstream repos and so on. The various solutions are just not pushed to the upstream repository... I taught myself some Go and will try to provide a working PR for the spark-on-k8s-operator repository so that spark makes use of the config.
This PR works as intended. The limitation is merely on the spark operator side. FYI I use spark 3.2.1 with Scala 2.13. Please let me know how you want to proceed. Thank you!
Excellent clarification - thank you. Yes, let's go ahead and merge this PR.
I taught myself some Go and will try to provide a working PR for the spark-on-k8s-operator repository so that spark makes use of the config.
I see. So this would be a PR in the upstream repo (that is based on Go), not necessarily here. Is that correct?
Congrats on your first merged pull request in this project! :tada: Thank you for contributing, we are very proud of you! :heart:
Originates from necro-bumped thread #523 (https://github.com/jupyter-server/enterprise_gateway/issues/523#issuecomment-1126131413).
cc @kevin-bates I haven't had the time to test it >yet<. I will test everything on Sunday/Monday and report back. 😃