ideonate / cdsdashboards

JupyterHub extension for ContainDS Dashboards
https://cdsdashboards.readthedocs.io/
Other
200 stars 38 forks source link

Streamlit runs on a different conda environment #24

Closed ricky-lim closed 4 years ago

ricky-lim commented 4 years ago

Is your feature request related to a problem? Please describe. Nope

Describe the solution you'd like A dashboard creation form that could show user-specific conda environments that runs a streamlit dashboarding server.

Describe alternatives you've considered Using the global conda environments, however streamlit does not run within a specified environment. It stills runs with base python, instead of the specified python environment.

Below is the error, in which I tried to use a conda env that installed bqplot package.

File "/opt/conda/lib/python3.8/site-packages/streamlit/ScriptRunner.py", line 319, in _run_script
    exec(code, module.__dict__)
File "/home/jovyan/test_streamlit.py", line 2, in <module>
    import bqplot as bq

The solution, that I thought that a user could create her/his own environment, for instance with miniconda to run streamlit server.

Background context I like to explore streamlit as a dashboarding framework for a project and every project has a specific set of dependencies.

Configuration The setup is using zero to kubernetes.

Thanks in advance. I look forward to hearing from you.

danlester commented 4 years ago

Thank you for your post!

I think there are two issues here:

  1. Ultimately, whether the conda env is global or per-user all comes down to the configuration of your singleuser environment - i.e. the Docker image you are using for your Jupyter server, and any conda configuration in that or in the user's home folder. I'm not sure why Streamlit isn't working as a global env anyway.

  2. Making a list of global conda envs and adding to jupyterhub_config is one thing, but if individual users can create their own conda envs, how do we allow them to specify the env when creating a new dashboard? Assuming global and per-user conda envs are really very similar (it's just that per-user envs are in the home folder) then it is true that the hub admin can't just maintain a big list of everyone's conda envs. The easiest approach would be to allow user to enter the full name of a conda env when they create the dashboard - so it doesn't have to be restricted to the global list.

So I think the first question is why you're not getting global conda envs to work. They should really! And then how do we allow the user to create similar conda envs on a per-user basis.

Please note it might not be a good idea to make per-user conda envs anyway since home folders are often mounted using slower media. It's possible that Kubernetes isn't the best solution for your JupyterHub - I'm happy to discuss.

What single user image are you using, and how are you attempting to create a new global conda env (if you have tried that yet)? e.g. do you have a Dockerfile extending one of the existing images, with an attempt to add new conda envs on top?

I've tried to play around with adding extra conda envs on top of the example image, including RUN conda init bash but I haven't been able to get it to stick yet, and I see the same problem as you - packages don't seem to be found.

So far I'm seeing the familiar issues that source activate works but conda activate doesn't seem to.

danlester commented 4 years ago

To get a global conda env, this has worked for me by adding to the example Dockerfile:

USER $NB_UID
RUN conda create --name globalenv1 -y
RUN source activate globalenv1
RUN conda install -c conda-forge bqplot -y

And of course add to config.yaml:

    conda-conf: |
      c.CDSDashboardsConfig.conda_envs = ['globalenv1']

If you've also been trying to get a global conda env working first, hopefully this helps - please let me know.

If you haven't really been looking into Dockerfiles at all, and are just hoping to get a local conda env installed in the home folder, that's fine we can start from there! Just let me know more about the config of your singleuser servers, e.g. at least the Docker image you are using.

ricky-lim commented 4 years ago

Hello,

Thank you very much for your swift response and a helpful guidance.

Yes, I could confirm that the global conda env, now works with my setup, after following your guidance.

The way I organized the global environment is via environment.yml file within a folder conda-env then I just added these lines to the singleuser Dockerfile:

COPY conda-env conda-env
USER root
RUN fix-permissions conda-env
USER $NB_UID

# Create custom environment...
RUN conda env create -f conda-env/test-env.yml

I agree that user-specific envs might not be a good idea for maintenance on the production cluster. However for our non-production environments, a flexibility to choose an environment to run the dashboard would still be very useful.

"The easiest approach would be to allow user to enter the full name of a conda env when they create the dashboard - so it doesn't have to be restricted to the global list."

Would it be a good idea, to allow users specify an absolute path for the conda-env, instead of a just a name?

Thank you again for your helpful guidance.

Cheers

ricky-lim commented 4 years ago

I was wondering if you know a workaround to run a streamlit script with the interpreter specified with a shebang, for instance in this script, to run using /opt/conda/envs/test-env/bin/python

#!/opt/conda/envs/test-env/bin/python

import streamlit as st
import pyjokes
import sys

st.write(sys.version)

if st.button('Make me laugh'):
    result = pyjokes.get_joke()
    st.write(f'{result}')

Thanks in advance.

danlester commented 4 years ago

Thank you for your update. Great to hear that at least global conda envs work - your Dockerfile snippet is really helpful.

I would like to get the local conda envs working too.

Ideally, it would be possible to specify either the full path or just a name (which should be fine if things are set up to locate it by name easily). As far as I know, it might already possible to set paths or names in the global list of conda envs anyway while we are experimenting. And then a UI change would mean the user can just type any name or path when they've created their own conda envs locally.

There was something slightly off with the conda installation in the singleuser images, so this still needs some work. I hope to try again soon, and please let me know if you also make any progress.

I haven't tried the shebang approach yet.

danlester commented 4 years ago

OK I think I've got somewhere...

Add to singleuser Dockerfile (within a USER $NB_USER section towards the end):

RUN conda init bash
COPY .condarc /home/jovyan/.conda/

where .condarc contains:

envs_dirs:
  - /home/jovyan/.conda/envs

Then in a terminal window within a Jupyter server I run:

conda create --name env2 -y
conda activate env2

This seems to work, and I can e.g.

conda install -c conda-forge pyjokes

However, note that I also seem to need to install Streamlit directly into this conda env too:

pip install streamlit

At that point, as long as I have 'env2' listed in the JupyterHub config:

c.CDSDashboardsConfig.conda_envs = ['env2']

and select that for my dashboard, the dashboard works fine...

As before, the next step would be a way for the dashboard developer to be able to type their own local conda env names (or maybe paths) because it's not really a global conda env available to everyone.

But first, I wondered if this works for you so far? The conda env is stored within the user's home folder.

ricky-lim commented 4 years ago

Hi Dan,

Thank you for your kind guidance. It's really helpful.

I was trying to follow your guidance. Unfortunately, I still encounter an issue with my setup to enable user-defined env, in which the .condarc is not copied to the jupyter singleuser server.

The condarc is within the docker container of singleuser, however not in jupyter-{username} running pod. Furthermore, the .conda directory is also not being created there.

Below is my setup snippet. I'd be really grateful if you could provide more guidance on how I could use Dockerfile to automate the creation of .condarc and $HOME/.conda for user, please.

Thank you in advance for your time and kind support. I look forward to hearing from you.

Below is my relevant setup.

singleuser/Dockerfile

USER ${NB_USER}
RUN conda init bash
COPY conda-env/.condarc /home/jovyan/.conda/

config.yaml

...
hub:
  extraConfig:
    cds-handlers: |
      from cdsdashboards.hubextension import cds_extra_handlers
      c.JupyterHub.extra_handlers = cds_extra_handlers
    cds-templates: |
      from cdsdashboards.app import CDS_TEMPLATE_PATHS
      c.JupyterHub.template_paths = CDS_TEMPLATE_PATHS
    cds-kube: |
      c.JupyterHub.spawner_class = 'cdsdashboards.hubextension.spawners.variablekube.VariableKubeSpawner'
      c.CDSDashboardsConfig.builder_class = 'cdsdashboards.builder.kubebuilder.KubeBuilder'
      c.JupyterHub.redirect_to_server = False
      c.JupyterHub.default_url = '/hub/dashboards'
    conda-conf: |
      c.CDSDashboardsConfig.conda_envs = ['test-env', 'local-env']

singleuser:
  storage:
    type: dynamic
    capacity: 10Gi
    dynamic:
      pvcNameTemplate: claim-{username}
      volumeNameTemplate: volume-{username}
      storageAccessModes: [ReadWriteMany]
danlester commented 4 years ago

Thank you for the update.

To be honest, I'm not quite sure why I thought button .condarc in the /home/jovyan/.conda folder would work... as you have found, this file copy works fine in the Docker build, but when starting the Jupyter server, the entire /home/jovyan folder is mounted as the user's persistent storage volume. This 'clobbers' anything in /home/jovyan that was in the Docker image, so of course you are right that /home/jovyan/.conda/.condarc is not present in the user's container in the end!

There are other possible locations to save the .condarc file that shouldn't be overwritten, e.g. try:

COPY conda-env/.condarc /etc/conda/.condarc

I believe Conda will create /home/jovyan/.conda/envs for itself when needed.

I haven't tried this yet, but will reproduce your entire setup when I get a chance so we can make sure we are talking about exactly the same thing going forward...

It is also worth pointing out that you don't absolutely need the .condarc file at all - it would be possible to just insist that your users specify the full path to the desired Conda folder when they create (and/or activate) the environment, but it is much easier to specify by name. And this is essential if you can't yet specify the conda env by full path.

We might also need to think about how to ensure any global envs are still accessible by name.

If you have things set up at the moment and can try the /etc/conda location, that would be really helpful, but maybe we're getting to the point where it makes sense for me to run through all of this myself including any UI adjustments. Your results and suggestions so far have been really informative.

ricky-lim commented 4 years ago

Hi Dan,

Thanks for such a swift response.

Yes, you're right with the path for .condarc. In the global /etc/, it works fine and it does help unifying user directory location for envs with such setup. Thank you.

Unfortunately, streamlit runner is not picking up the right path, if I just added c.CDSDashboardsConfig.conda_envs = ['test-env', 'test-local-env'].

Below is the error message, the expected local env path is /home/jovyan/.conda/envs/, however it still looks within /opt/conda/envs/

Error report from ContainDS Dashboards
Command Running:

python3 -m jhsingle_native_proxy.conda_runner /opt/conda /opt/conda/envs/local-env streamlit run ./demo-littleform/test.py --server.port=41844 --server.headless=True

Error output:

None

Standard output:

Not a conda environment: /opt/conda/envs/local-env

I think with specifying the environment with only names, the risk of name collision could be an issue. My preference is to explicitly set a path (conda-env prefix) to run a dashboard. I think it makes it clear for a dashboard creator.

Furthermore, concerning the local env, I found that creating conda env (e.g env-3) without specifying the default python interpreter, would install the bin folder of the installed python packaged into /opt/conda/bin instead of /home/jovyan/.conda/envs/<env_name>/bin (home-environment)

# Without specified python 
$ conda create -n env-3 -y
$ conda activate env-3
$ pip install pyjokes
$ which pyjoke
/opt/conda/bin/pyjoke

# With python3.7 specified
$ conda create -n test-local-env python=3.7
$ conda activate test-local-env
$ pip install pyjokes
$ which pyjoke
/home/jovyan/.conda/envs/test-local-env/bin/pyjoke

I was wondering if we could prevent such inconsistencies, particularly to prevent the installation into the global conda env ?

Thanks again for your time and kind help.

Cheers

danlester commented 4 years ago

I agree going forward that being able to specify the full path would be useful, but I would still like to see if we can achieve a proof-of-concept local conda env first.

I've now tried with a setup based on yours and seem to have made further progress thanks also to your latest insights...

I ran this from scratch and it seems to work, but definitely needs you to specify python explicitly when you create the conda env.

I'm using the latest hub image ideonate/cdsdashboards-jupyter-k8s-hub and my single user image is based on this example Dockerfile, adding the following at the end:

USER $NB_UID
COPY .condarc /etc/conda/.condarc

RUN conda init bash
COPY conda-init.sh /etc/profile.d/conda-init.sh

where .condarc contains (as before):

envs_dirs:
  - /home/jovyan/.conda/envs

and conda-init.sh contains:

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/conda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/opt/conda/etc/profile.d/conda.sh" ]; then
        . "/opt/conda/etc/profile.d/conda.sh"
    else
        export PATH="/opt/conda/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<

The thing is that 'conda init bash' adds lines to the user's .bashrc file to be sourced when a new bash login shell is started. (Maybe it also does some other global things, so I've also left the 'conda init bash' line in the Dockerfile for now.)

Just as for the .condarc file, the /home/jovyan/.bashrc file will be clobbered when the user's persistent volume is mounted, so I've copied the same contents into a permanent location which should still be sourced (/etc/profile.d) for every new shell.

I'm not convinced that the Streamlit runner absolutely needs these steps (especially if we will just be able to provide the full path to the Conda env in future) but it makes things more consistent and easier to work on the terminal for now.

So in a Jupyter notebook terminal, I now see I'm in the base Conda env by default. The prompt is (base) jovyan@jupyter-dan:~.

I run:

conda create -n myenv python=3.7

conda activate myenv

python -m pip install streamlit cdsdashboards[user] pyjokes

It appears that by default, the new myenv doesn't 'inherit' anything from the base environment, but there should be a --clone option to do that. In any case, it seems to be dangerous for python (and streamlit) to be run from the base environment as this 'jumps' back to the base env in terms of paths etc.

I have seen before that executables can easily leak between environments. By executables I mean direct commands such as 'pip'. For that reason, people recommend running python -m pip instead of pip as a general rule. In our case, unless we specify a new Python for the new env, python itself seems to fall into this trap. Other than that, it is worth noting that conda and pip aren't famous for working well together...

Anyway, if I create a dashboard using conda env 'myenv' everything now seems to work. If I don't select any conda env in my dashboard, pyjokes cannot be found.

I'm using your jokes.py from before:

import streamlit as st
import pyjokes
import sys

st.write(sys.version)

if st.button('Make me laugh'):
    result = pyjokes.get_joke()
    st.write(f'{result}')

I would say there still seems to be something slightly broken about this Conda setup, and it's possible that goes back to the way Conda is installed and configured in the Jupyter Docker stacks.

It would be great if you can see if you get the same results as me so far, and then we can decide how to progress, e.g. is a front end change now useful or do we still need a greater understanding of Conda in these images...

ricky-lim commented 4 years ago

Hi Dan,

Thank you for setting up the test and your useful guidance.

I tried with python -m pip install pyjokes from a local-env and as you mentioned I also observed such leaking across environments. Furthermore, within base environment, I could also import pyjokes, unfortunately. If I do a pure, conda-way, with conda install -c conda-forge pyjokes, the leaking is not observed. conda and pip appears to work together, unfortunately with a few unxpected suprises.

Following your instructions, I could confirm that the current conda-setup works as well for local envs with streamlit.

With relation to the singleuser docker image, I have a question regarding how could streamlit uses the authenticated state of a user ? Within the singleuser image, the NB_USER env is set into jovyan and JUPYTERHUB_USER is set to the creator of the dashboard, instead of the authenticated user. I was wondering what is your advice for a dashboard creator to use the authenticated state, such as a username following the dashboard authentication?

Thank you in advance for your time and kind guidance.

I look forward to hearing from you.

Cheers

danlester commented 4 years ago

Great, so I think the conclusion is that it should be possible to make use of the feature we've been talking about - allowing the dashboard creator to type the name of any local (or global) conda env or path. I've separated this into a new feature request: #27

Regarding your other question (which is a great question), I've also set up a new feature request: #28

Actually, if you have any further background to add to that issue, please do - e.g. some idea of why you want the user's JupyterHub username in your dashboard and what you would do with it.