Demo config - OpenRefine

psychemedia commented 3 years ago

I just had a go at creating an OpenRefine web app.

TLDR: doesn't work without manual intervention (unless I'm missing something obvious / all the bits aren't built yet...)

(Borken) config file is:

module:
  code: TM351-openrefine
  presentation: 21J
type: web-app
packages:
  apt:
   - wget
   - openjdk-11-jdk
scripts:
  - inline:
    - wget --no-check-certificate https://github.com/OpenRefine/OpenRefine/releases/download/3.3/openrefine-linux-3.3.tar.gz && mkdir /var/openrefine && tar -xzf openrefine-linux-3.3.tar.gz --directory /var/openrefine  && rm openrefine-linux-3.3.tar.gz && mkdir -p /home/ou-user/openrefine/TM351-openrefine-21J
web_app:
  cmdline: /var/openrefine/openrefine-3.3/refine -i 0.0.0.0 -p 3333 -d /home/ou-user/TM351-openrefine-21J
  port: 3333

Note that the set-up is:

is a single standalone web-app;
no Jupyter class notebook or JupyterLab requirement.

To get things to install, I had to hack the Dockerfile because there is a snafu with the Java package installation that I wasted a chunk of time trying to track down:

# https://github.com/geerlingguy/ansible-role-java/issues/64#issuecomment-393299088
RUN mkdir -p /usr/share/man/man1 &&  apt-get install -y wget openjdk-11-jdk

I guess I could have used the inline script to explicitly do the apt-get after creating the required directory. But that then starts to separate the same concern (package install) to separate RUN commands.

I note that the scripts run as root (but wasn't originally assuming that; eg in repo2docker, postBuild scripts run under the $NB_USER user account). Might it also be useful to offer a user value (default as root that allows you to run the script as a specific user? Setting the user would then wrap the script between USER {user} and USER root elements in the generated Dockerfile.

The port used by the service could use an automatically assigned {port} but I was trying to minimise potential for brokenness / simplify debugging.

Issues:

if you go to http://localhost:PORT/ (where PORT is whatever you map 8888 from inside the container, all you get is a not very useful message:

Note that no token is requested at this point — the page is rendered without a token request.

I was working under the assumption that the web-app declaration would generate and register proxied services; typically, by calling their URL, the service should start if it isn't running, or be connected to if it is running.

Under that assumption, the homepage should probably have links to whatever proxied services are installed as web-app types. This essentially recreates the additional services listing that is added to the notebook homepage New menu when you install and register jupyter-server proxy elements in a traditional way.

If I try to go to the proxied port explicitly (http://localhost:8823/proxy/3333/ OpenRefine requires the trailing slash or style breaks), I get a Jupyter server token challenge. (If you run the container with a -e JUPYTER_TOKEN=letmein switch, you can force the token a a pre-specified default value (eg letmein). This makes me thing for standalone running it might make sense to allow the setting of an explicitly declared default TOKEN, or revert to a built in default (I'm not sure users ever benefit from having to reference the random token if their browser isn't automatically opened, because finding the token is a faff).

Having entered the token, I get a 500 error, because there is nothing running.

If I ssh into the running container, I can start the service running manually:

/var/openrefine/openrefine-3.3/refine -p 3333 -i 0.0.0.0  -d /home/ou-user/openrefine

And things seem to run and be accessible at least:

If there is a single service defined, I might guess that the application is running on http://localhost:PORT/web-app, but there is nothing there. (It's not clear how that assumption would play out if multiple apps were defined.)

Ideally, I would expect to be able to reference the application by some sort of name (eg http://localhost:PORT/openrefine ). The config might then be of the form:

web_app:
  name: openrefine
  cmdline: /var/openrefine/openrefine-3.3/refine -i 0.0.0.0 -p {port} -d /home/ou-user/TM351-openrefine-21J

Something else I was expecting was to see a service definition in the automatically generated files, eg a traitlet definition for the service included in an automatically created jupyter_notebook_config.py file or whatever file the server-proxy could pick up on.

The following fragment is taken from a traditional jupyter-server-proxy setup; many of the fields could be directly added to the web-app definition; the dictionary key defines the service name to be used in the URL path:

# Traitlet configuration fragment for jupyter_notebook_config.py

c.ServerProxy.servers = {
    'openrefine': {
        'command': ['/var/openrefine/openrefine-3.3/refine', '-p', '{port}','-d','/home/ou-user/{MODULE_CODE}-{MODULE_PRESENTATION}'],
        'port': 3333,
        'timeout': 120,
        'launcher_entry': {
            'enabled': True,
           #The icon-path is not part of the openrefine distro but should be added there?
            'icon_path': '/home/jovyan/.jupyter/open-refine-logo.svg',
            'title': 'OpenRefine',
        },
    },
}

The (optional) launcher entry elements could be reused to help mark up a "service launcher" page on the localhost:PORT landing page (which would have to use some sort of customised template or customised override on the default landing page).

mmh352 commented 3 years ago

Ok. The manpages issue is a debian-buster-slim issue, because the openjdk packages makes some assumptions without adding the necessary dependencies. I've added a "hacks" section to the config file, that hides the processes needed to set that up. I haven't released that yet, but I'll let you know when it is available.

The second problem is that the custom web application functionality is not fully implemented yet. Working on that now.

mmh352 commented 3 years ago

The missing functionality is now fully implemented (d070f9d). I've also added a demos/openrefine demo that is based upon the config you posted. I've had to modify the config structure a bit, but now it nicely builds the openrefine container.

psychemedia commented 3 years ago

Good stuff :-)

Re: your internal hack to cope with open-jdk: I'm a bit wary about what that hides? I agree it's cleaner, but it means that you partly have to guess what's inside the base container? (Which makes me wonder: what's the policy re: base container(s)? For some applications, it may make more sense to use a different base container where a base container exists and the installation is horrible and/or difficult?

The command-line statement is an obvious source of errors syntax wise, and also hard to read and a faff to declare?

web_apps:
  - path: openrefine
    cmdline:
      - /var/openrefine/openrefine-3.3/refine
      - -i
      - 0.0.0.0
      - -p
      - "{port}"
      - -d
      - /home/ou-user/OpenRefine

Cleaner might be to allow a single long line and then split it into tokens on the space character?

With the scripts section:

scripts:
  - inline:
    - wget --no-check-certificate https://github.com/OpenRefine/OpenRefine/releases/download/3.3/openrefine-linux-3.3.tar.gz
    - mkdir /var/openrefine
    - tar -xzf openrefine-linux-3.3.tar.gz --directory /var/openrefine
    - rm openrefine-linux-3.3.tar.gz

does this get sent to a single layer (I haven't checked in the generated Dockerfile)?

What's the rationale for one line per command? Clarity?

Re: the directory that gets mounted, ideally this would be set into $HOME somewhere where it can persist (eg /home/ou-user/$CODE-$PRESENTATION/openrefine). I wasn't sure whether you had a particular convention in mind for that or how the templating might best support it?

psychemedia commented 3 years ago

Just in passing, I also note that in my original Dockerfiles, the version (3.3) was assigned to a VERSION variable and referenced as such. Again, the current config does not support that, but I know from experience that if you leave version numbers in paths as literals, when you update a version it's all too easy to miss one!

mmh352 commented 3 years ago

Could I please ask you to split those into new issues and ideally one problem per issue.

mmh352 / ou-container-builder

Demo config - OpenRefine #11

Issues: