ticoneva / pystata-kernel

Jupyter kernel for Stata based on pystata
GNU General Public License v3.0
13 stars 6 forks source link

Running in a docker instance? Currently have kernel death #7

Open perllaghu opened 2 years ago

perllaghu commented 2 years ago

Has anyone got this working in a jupyter notebook in a docker image?

I'm rebuilding a Jupyter Notebook Docker image with the stata kernel installed. It worked fine with Stata 16 (I was using stata_kernel) and I thought all was well with Stata 17 - however I recently discovered I could not get that kernel to display images.

I've switched to pystata-kernel, and now I'm just getting Kernel Restarting errors for the simplest things.

(I should note, at this point, that my notebook-servers are built to run in a kubernetes cloud, with the users home directory mounted at /home/jovyan - so anything built into /home/jovyan gets lost.)

Dockerfile

FROM alpine:3.15.0

WORKDIR /tmp
RUN wget -nv http://some.internal.file.server/files/Stata17Linux64.tar.gz

#### step 1
FROM jupyter/minimal-notebook:2022-05-01
USER root
RUN apt-get update \
  && apt-get install -yq --no-install-recommends \
    expect \
    libncurses5 \
    libtinfo5 \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/*

WORKDIR /tmp
COPY --from=0 /tmp/Stata17Linux64.tar.gz .

RUN mkdir statafiles \
    && cd statafiles \
    && tar -zxf /tmp/Stata17Linux64.tar.gz
COPY stata-notebook/install-scripts/. /usr/local/bin/.
RUN mkdir /usr/local/stata -p
WORKDIR /usr/local/stata

ENV STATA_NUMBER="1234"
ENV STATA_CODE="abcd e\$gh"
ENV STATA_AUTHORIZATION="foo"

# Instal stata
RUN /usr/local/bin/interact_stata.sh
RUN /usr/local/bin/interact_stinit.sh
ENV PATH=/usr/local/stata:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

# Install stata kernel
USER $NB_USER
RUN mkdir -p $HOME/.pystata-kernel.conf \
    && chown -R $NB_USER $HOME/.pystata-kernel.conf
RUN pip install pystata-kernel \
    && python -m pystata-kernel.install
USER root
RUN mkdir -p /usr/local/share/jupyter/kernels \
    && mv $HOME/.local/share/jupyter/kernels/pystata /usr/local/share/jupyter/kernels

# Remove temporary stata files
RUN rm -rf /tmp/statafiles && rm /tmp/Stata17Linux64.tar.gz
RUN chown -R $NB_UID:$NB_GID $HOME

COPY docker-entrypoint.sh /usr/local/bin/

RUN find $CONDA_DIR -not -user $NB_USER -exec chown $NB_UID:$NB_GID {} \;
WORKDIR $HOME
USER $NB_USER

The docker-entrypoint.sh is so I can create the config file in a users home directory:

tee ~/.stata_kernel.conf << END
[stata_kernel]

# Path to stata executable. If you type this in your terminal, it should
# start the Stata console
stata_path = /usr/local/stata/stata-se

# Directory to hold temporary images and log files
cache_directory = ~/.stata_kernel_cache

# Whether autocompletion suggestions should include the closing symbol
autocomplete_closing_symbol = False

# # Extension and format for images
# graph_format = svg

# Scaling factor for graphs
graph_scale = 1

# List of user-created keywords that produce graphs.
# Should be comma-delimited.
user_graph_keywords = coefplot,vioplot
END

exec "$@"

Everything builds fine, the notebook-server starts fine, and a notebook-document starts for the stata kernel.

The test

I have tried in both the Classic and Lab interfaces - both exhibit the same issue

Even running the simplest code - display "Hello, world!" - I get a Kernel Restarting error popup (The kernel for Untitled.ipynb appears to have died. It will restart automatically.)

Log file

This is what I get in my docker log-file:

[I 2022-08-24 13:59:17.186 LabApp] Build is up to date
[I 2022-08-24 13:59:22.669 ServerApp] Kernel started: 7ccbc29e-b51f-4496-939f-8bc0eed9b04b
[I 2022-08-24 14:00:05.390 ServerApp] Creating new notebook in 
[I 2022-08-24 14:00:05.448 ServerApp] Writing notebook-signing key to /home/jovyan/.local/share/jupyter/notebook_secret
[I 2022-08-24 14:00:05.791 ServerApp] Kernel started: 693f97e5-3819-4656-b3b5-d3890bd5ab61
[I 2022-08-24 14:00:17.780 ServerApp] AsyncIOLoopKernelRestarter: restarting kernel (1/5), new random ports
[W 2022-08-24 14:00:17.780 ServerApp] kernel 693f97e5-3819-4656-b3b5-d3890bd5ab61 restarted
[I 2022-08-24 14:00:17.783 ServerApp] Starting buffering for 693f97e5-3819-4656-b3b5-d3890bd5ab61:c89c684f-b42b-4234-8851-8f7b87d1aa32
[I 2022-08-24 14:00:17.810 ServerApp] Restoring connection for 693f97e5-3819-4656-b3b5-d3890bd5ab61:c89c684f-b42b-4234-8851-8f7b87d1aa32

Looking for help/advise on making this work...

ideabucket commented 2 years ago

Yes, I have it working fine in Docker, although I just run the container locally. Here's my Dockerfile if you want to compare.

ideabucket commented 2 years ago

Actually… are you creating a $HOME/.pystata-kernel.conf at any point? Pystata-kernel doesn't read .stata_kernel.conf, and doesn't use the same config directives.

perllaghu commented 2 years ago

Apologies for the silence - I had to jump to another project for a while.

Looking at your Dockerfile, I'm not getting a running build yet.... however I can see a future problem for me: I use docker-compose, which doesn't use secrets.... which dataeditors/stata17 needs.

I've managed to build your image, but still getting the same fault.... getting a colleague to independently check my work.

Question: what happens if the stata.lic file is invalid?

ideabucket commented 2 years ago

You can use secrets in docker-compose. See sample here. I've been using docker compose to run that image for a while but I'd forgotten to push to github.

If there's no valid stata.lic, stata will refuse to run--but as far as I can tell it always exits with rc 0, so if it's being run non-interactively it won't look like an error.

Also, I'm just using dataeditors/stata17 as a source of the binaries because the official Stata download site requires credentials. It's not a runtime dependency. If you'd prefer to (a) use a stata installation tarball cached locally and/or (b) copy stata.lic into the image, have a look at the Dockerfile as of this commit, although at that point I wasn't using docker-compose.

perllaghu commented 2 years ago

Ah... excellent - thanks... definitely getting further now.

Current block is pystata (well, the lack of...)

Run in a jupyterlab console:

(base) jovyan@7ac94155e087:~$ pip list | grep pystata
pystata-kernel                0.1.21
(base) jovyan@7ac94155e087:~$ pip install pystata
Collecting pystata
  Downloading pystata-0.0.1-py3-none-any.whl (21 kB)
Collecting pandas
  Downloading pandas-1.4.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.6/11.6 MB 3.5 MB/s eta 0:00:00
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.10/site-packages (from pandas->pystata) (2022.2.1)
Requirement already satisfied: python-dateutil>=2.8.1 in /opt/conda/lib/python3.10/site-packages (from pandas->pystata) (2.8.2)
Collecting numpy>=1.21.0
  Downloading numpy-1.23.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.0/17.0 MB 2.4 MB/s eta 0:00:00
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.10/site-packages (from python-dateutil>=2.8.1->pandas->pystata) (1.16.0)
Installing collected packages: numpy, pandas, pystata
Successfully installed numpy-1.23.2 pandas-1.4.4 pystata-0.0.1
(base) jovyan@7ac94155e087:~$ pip list | grep pystata
pystata                       0.0.1
pystata-kernel                0.1.21
(base) jovyan@7ac94155e087:~$ python
Python 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pystata
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pystata'
>>> 

.... which, understandably, kills the pystata-kernel launch_stata (et al)

ideabucket commented 2 years ago

The pystata on PyPI is not Stata's pystata, which is shipped inside the stata install. (You'll find it in stata_home_directory/utilities/pystata/.) You shouldn't need to install it manually.

Did you follow the full installation instructions, including python -m pystata-kernel.install and creating a .conf file? For me, this is all that is necessary to get the kernel working—but note that it doesn't add pystata to the module search path, so you will get the error you got if you try to test it in the Python REPL.

Python 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path.append('/usr/local/stata17/utilities')
>>> import pystata
>>> pystata.config.init('mp')

  ___  ____  ____  ____  ____ ®
 /__    /   ____/   /   ____/      17.0
___/   /   /___/   /   /___/       MP—Parallel Edition

 Statistics and Data Science       Copyright 1985-2021 StataCorp LLC
                                   StataCorp
                                   4905 Lakeway Drive
                                   College Station, Texas 77845 USA
                                   800-STATA-PC        https://www.stata.com
                                   979-696-4600        stata@stata.com

Stata license: Unlimited-user 4-core network, expiring  1 Apr 2023
Serial number: [redacted]
  Licensed to: [redacted]
               [redacted]

Notes:
      1. Unicode is supported; see help unicode_advice.
      2. More than 2 billion observations are allowed; see help obs_advice.
      3. Maximum number of variables is set to 5,000; see help set_maxvar.
>>> 
ticoneva commented 2 years ago

pystata-kernel uses the same routine as stata-kernel to locate your Stata installation, then uses the same routine as the official stata_setup to find pystata. You should not need to add pystata's location to the module search path if you just want to use pystata-kernel.

If you are manaully importing pystata, then you will have to do it the way @ideabucket demonstrated.

As school has started, I am sorry I cannot be of more help on this topic as it is not my top priority right now.

perllaghu commented 2 years ago

OK.... I've solved it - it required bits of knowledge from various comments above.

For some reason, I wasn't getting the pystata fine in .../utilities/pystata/ - which threw me down the pythonic pystata route

I've got an image that does the install as I had it, but in a first-stage image - then used the rest of method that @ideabucket uses to copy that installed stata into my final image.

I've even managed to include the update section I was completely unaware of, to improve my final Notebook Server.

Now I need to deploy and test that the nbgrader stuff is still working [in the classic UI] :)