Open Tianhao-Gu opened 2 months ago
A possible solution is to write a startup script that retrieves the JupyterHub user's name from the environment variable and injects it into the Spark configuration file. This will ensure that the correct system user is used without having to create duplicate system users.
#!/bin/bash
USER_NAME=${JUPYTERHUB_USER:-"default_user"}
sed -i "s/YARN_USER/$USER_NAME/" /path/to/spark/conf/spark-defaults.conf
Explore alternatives to not creating system users with the same name as JupyterHub users, while ensuring the same system user is used to launch the user's Docker container. This is to ensure Yarn's fair share to function correctly.