plasmabio / plasma

Plasma is an e-learning Jupyter-based platform for data analysis
https://docs.plasmabio.org
BSD 3-Clause "New" or "Revised" License
42 stars 12 forks source link

Docker container crashes after git jlab shell error #191

Closed jvail closed 2 years ago

jvail commented 3 years ago

Salut,

we are using plasmabio on our server but I have an odd problem and don't know if it is an issue with our setup or with plasma. We are building images from git repositories and when I try to pull/push changes and the git credentials are not properly set then the whole server crashes. Screenshot attached. Every user needs to setup his own credentials after building the image. But we use the jlab git extension that initially uses the wrong credentials and then immediately crashes the server - to chance to update the credentials via the shell. I'd be grateful for a hint what may go wrong here. Thank you Jan

jlab

pierrepo commented 3 years ago

Hello @jvail Thank you for reporting what looks like a bug. I installed the plasmabio-template-python environment on our own server and I obtained a similar result: image

pierrepo commented 3 years ago

When creating an environment from https://github.com/plasmabio/template-python and then running this env on your Plasma server, you should have a .git directory in /home/user-name/plasmabio-template-python. This .git directory is useless since it is inherited from the original repo (https://github.com/plasmabio/template-python).

I manage to run git (and git push) properly by doing the following:

jvail commented 3 years ago

Thank you @pierrepo. I may have misled you with my description and example. The git error it not so much an issue but I don't understand why it crashes the server.

Edit: Sorry, have not see the answer above! Please let me know if you have a idea for a workaround.

pierrepo commented 3 years ago

The git error it not so much an issue but I don't understand why it crashes the server.

Neither do I :cry: This bug is really weird.

Please let me know if you have a idea for a workaround.

Could you give me more details on what exactly you would like to do?

jvail commented 3 years ago

The git error it not so much an issue but I don't understand why it crashes the server.

Neither do I cry This bug is really weird.

Yes, it is. I tried a few exit codes in the jlab terminal to try to reproduce it without git but with no success. But to test if it happens only with an "unsane" .git I tried the following: Make another checkout of a git repo that can not be broken.

jlab2

Please let me know if you have a idea for a workaround.

Could you give me more details on what exactly you would like to do?

I need to be able to sync the server/image with the repository without the need to remove and rebuild it. And I'd like to use the git jupyterlab plugin because some users are not familiar with the terminal. And with this plugin the server crashes immediately if I click on the git plugin tab.

P.S.: Yes, the proper way is to setup the SSH credentials (as you mentioned) so the git plugin wont work initially and that is what I expect. But the server crash is weird and hard to handle.

jvail commented 3 years ago

I have tried to figure out what's going on there from going through the hub logs (sudo journalctl -u jupyterhub -f, https://tljh.jupyter.org/en/latest/troubleshooting/logs.html) and attaching a shell to the running jupyter server: it seems jupyter itself does not crash but the docker container does. Not sure it that is helps. Therefor I have renamed the issue.

pierrepo commented 3 years ago

Many thanks for your investigations! I'm pretty sure it will help. Could you please share here the relevant logs showing that the container is crashing?

jvail commented 3 years ago

Hi @pierrepo, sure. I had to remove a few things with domains/ips because it is a company server. But those logs did not seem to be relevant. Mostly messages from traefik. These are the logs that appeared right after "git push" and the failed auth:

Nov 09 06:43:05 jupyterhub traefik[2796]: time="2021-11-09T06:43:05Z" level=error msg="vulcand/oxy/forward/websocket: Error when copying from backend to client: websocket: close 1006 (abnormal closure): unexpected EOF"
Nov 09 06:43:05 jupyterhub dockerd[922]: time="2021-11-09T06:43:05.147290273Z" level=info msg="ignoring event" container=6f36d03e2f1f9e41f17b9669a34fc192ca2d6a268f071c34ef5111e0d4989852 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 09 06:43:05 jupyterhub containerd[840]: time="2021-11-09T06:43:05.148527936Z" level=info msg="shim disconnected" id=6f36d03e2f1f9e41f17b9669a34fc192ca2d6a268f071c34ef5111e0d4989852
Nov 09 06:43:05 jupyterhub containerd[840]: time="2021-11-09T06:43:05.148858975Z" level=error msg="copy shim log" error="read /proc/self/fd/12: file already closed"
Nov 09 06:43:05 jupyterhub systemd-networkd[791]: vethed16e29: Lost carrier
Nov 09 06:43:05 jupyterhub kernel: docker0: port 1(vethed16e29) entered disabled state
Nov 09 06:43:05 jupyterhub kernel: veth377bc5a: renamed from eth0
Nov 09 06:43:05 jupyterhub networkd-dispatcher[819]: WARNING:Unknown index 193 seen, reloading interface list
Nov 09 06:43:05 jupyterhub systemd-udevd[91431]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Nov 09 06:43:05 jupyterhub systemd-udevd[91431]: Using default interface naming scheme 'v245'.
Nov 09 06:43:05 jupyterhub kernel: docker0: port 1(vethed16e29) entered disabled state
Nov 09 06:43:05 jupyterhub systemd-networkd[791]: vethed16e29: Link DOWN
Nov 09 06:43:05 jupyterhub kernel: device vethed16e29 left promiscuous mode
Nov 09 06:43:05 jupyterhub kernel: docker0: port 1(vethed16e29) entered disabled state
Nov 09 06:43:05 jupyterhub systemd-networkd[791]: rtnl: received neighbor for link '194' we don't know about, ignoring.
Nov 09 06:43:05 jupyterhub systemd-networkd[791]: rtnl: received neighbor for link '194' we don't know about, ignoring.
Nov 09 06:43:05 jupyterhub systemd[1374]: run-docker-netns-7c2d5a9e672f.mount: Succeeded.
Nov 09 06:43:05 jupyterhub systemd[89736]: run-docker-netns-7c2d5a9e672f.mount: Succeeded.
Nov 09 06:43:05 jupyterhub systemd[1]: run-docker-netns-7c2d5a9e672f.mount: Succeeded.
Nov 09 06:43:05 jupyterhub systemd[89736]: var-lib-docker-overlay2-14f71cf9f190e376757c41ebbbe41a92ce2ff61e5be4310c2590c011d1a10f32-merged.mount: Succeeded.
Nov 09 06:43:05 jupyterhub systemd[1]: var-lib-docker-overlay2-14f71cf9f190e376757c41ebbbe41a92ce2ff61e5be4310c2590c011d1a10f32-merged.mount: Succeeded.
Nov 09 06:43:05 jupyterhub systemd[1374]: var-lib-docker-overlay2-14f71cf9f190e376757c41ebbbe41a92ce2ff61e5be4310c2590c011d1a10f32-merged.mount: Succeeded.
Nov 09 06:43:06 jupyterhub systemd-networkd[791]: docker0: Lost carrier
Nov 09 06:43:06 jupyterhub systemd-udevd[91431]: veth377bc5a: Failed to get link config: No such device
pierrepo commented 3 years ago

Many thanks @jvail :pray:

jvail commented 2 years ago

For now my workaround is to setup the ssh key and then change the remote url for git to ssh in the postBild script. Then the jupyterlab-git extension does not crash the server. An idea: Would be nice to support ssh also in the environment creation dialog.

jtpio commented 2 years ago

Thanks @jvail and @pierrepo for investigating this.

@jvail are you able to try without the git extension installed? Just to double check whether the issue is related to that particular extension.

For the logs you might want to check the user server logs, since this might be where the error is originating from. Maybe the git server extension is causing the server to crash. Something like docker logs <id-of-the-user-container> should help get more info.

jvail commented 2 years ago

Hi @jtpio, here https://github.com/plasmabio/plasma/issues/191#issuecomment-960896735 I made a test only using the plasmabio-template-python repo. No git extension there.

Whenever there is a git fatal error the server becomes unavailable.

jtpio commented 2 years ago

Ah thanks @jvail.

I could reproduce this on a fresh install with the same template-python repo.

image

pierrepo commented 2 years ago

Hello. Indeed, it's very reproducible. Of course, pushing something to the original repo of template-python does not make sense since most users do not have proper rights to do such a thing. This should lead to a simple git error inside the terminal. For some weird reason, it crashes the server.

jvail commented 2 years ago

But - to mention the positive side as well - apart from this error plasma is great and works very well for us!

jtpio commented 2 years ago

The server logs don't provide much information.

Out of curiosity I tried with tljh-repo2docker only and this does not seem to be an issue.

So maybe this is related to the custom entrypoint in Plasma: https://github.com/plasmabio/plasma/blob/master/tljh-plasma/tljh_plasma/entrypoint/entrypoint.sh

jvail commented 2 years ago

So maybe this is related to the custom entrypoint

@jtpio Could it be the set -e part? I was unable to get the dev env working so could not test it.