Closed unkcpz closed 1 year ago
Is this reproducible? If so, to understand if the issue is with the template or not, I suggest to deploy another app (e.g. on the dev machine) with a standard template. If it's the template, check what changed (maybe newer versions of voila require changes to the template?). Or if it's not the template, change the code simplifying it to see when it stops timing out?
@giovannipizzi thanks a lot for the suggestions.
Is this reproducible?
I won't say it is fully reproducible. It appears more frequently when I use less pre-heat kernel. On the dev server, since the load is not big, it rarely happened. If there is a tool to intentionally try to request the service and force it to open more kernels, we can give it a try. But I still do not fully understand how voila allocate kernels. It is based on IP or every time a user opens a browser session it will open a kernel? @dou-du Do you know how it works under the hood?
I think it will be also useful if we have some tool that can conduct a stress test for any dokku app. @eimrek Is that possible to have this as what the monitor server is doing now?
Hi, could this be related to https://github.com/gliderlabs/herokuish/issues/659?
And also this fix might be related:
Edit: regarding the stress test, yes, in principle we could use something similar as we use in the monitor server to access the app many times: https://github.com/materialscloud-org/openstack-ansible/blob/master/roles/mc_monitor_server/files/check_httpjs.py
@eimrek thanks a lot! I think they are related. The change in fix-dockerfile.yml
made by @ltalirz was to change the permission of /dev/random
to 640
to make it inaccessible. However, neither the /dev/random
on host or on the optimadeclient container have 444
. I am not pretty sure what is this gliderlabs/herokuish
image used for but there is a new one with the tag latest
which I go into image and the /dev/random
is 444
.
Maybe @dou-du ever build the image by hand? I remember you mentioned you have updated the version? Can you confirm with it?
I think to fix this, I need first figure out what is this gliderlabs/herokuish
used for and check if the /dev/random
is modified to workaround the issue. If that is the case, we can tear down all the app and delete the gliderlabs/herokuish
images and redeploy the server by running the whole ansible role. Any idea and comment on this? @giovannipizzi @eimrek
We can now use python 3.8 for optimade-client which means dokku or herokuish (I need to spend some time to understand how these things work under the hood, sorry for the vagueness of the terminologies here.) is updated as mentioned at https://github.com/materialscloud-org/tools-optimade-client/pull/42
@eimrek thanks a lot! I think they are related. The change in
fix-dockerfile.yml
made by @ltalirz was to change the permission of/dev/random
to640
to make it inaccessible. However, neither the/dev/random
on host or on the optimadeclient container have444
. I am not pretty sure what is thisgliderlabs/herokuish
image used for but there is a new one with the taglatest
which I go into image and the/dev/random
is444
.Maybe @dou-du ever build the image by hand? I remember you mentioned you have updated the version? Can you confirm with it?
I think to fix this, I need first figure out what is this
gliderlabs/herokuish
used for and check if the/dev/random
is modified to workaround the issue. If that is the case, we can tear down all the app and delete thegliderlabs/herokuish
images and redeploy the server by running the whole ansible role. Any idea and comment on this? @giovannipizzi @eimrek
I updated the version number from ansible.
Indeed, just start from the latest version of the image and apply the fix to prevent the "stuck due to lack of entropy" issue.
Thanks @ltalirz. Could you elaborate a bit on how the image gliderlabs/herokuish:latest
is used in the dokku server? I using docker run -it gliderlabs/herokuish:latest
to go into the container and find the /start
is the modified one with the change you applied. But the /dev/random
permission is not changed. Is that because /start
script as entrypoint is not executed or is that the /dev/random
is not able to apply changes for permission?
If you just run the image with the entry point bash
, then the permission won't be changed.
herokuish is built on top of the heroku docker image and basically does what they do (detect what kind of app the user is trying to run, install dependencies, etc.)
If you just run the image with the entry point bash, then the permission won't be changed.
Then how do I test if this change is taking effect? I just can not find where the image built by ansible task https://github.com/materialscloud-org/openstack-ansible/blob/master/roles/mc_dokku/tasks/fix-dockerfile.yml is used in the server.
Then how do I test if this change is taking effect?
Well, if you exec
into a container started by dokku, you will see.
I just can not find where the image built by ansible task https://github.com/materialscloud-org/openstack-ansible/blob/master/roles/mc_dokku/tasks/fix-dockerfile.yml is used in the server.
Well, dokku should be using this image (you may need to look at the dokku source code).
In any case, you can simply look at the containers running in one of the dokku instances and see whether they are using the image
@ltalirz thanks and sorry for the late reply, I was on vacation last week.
Well, if you exec into a container started by dokku, you will see.
I did this and it is for sure not using have the /dev/random
set to correct permission.
In any case, you can simply look at the containers running in one of the dokku instances and see whether they are using the image
I use docker inspect
to trace the image of the optimadeclient
and then inspect the image. What I found are 1) the sha256 id of Parent
image can not be found in the list of docker ps
and the herokuish's dockerhub https://hub.docker.com/r/gliderlabs/herokuish/tags?page=1. 2) There are maybe related useful label information to debug, I attached below.
"Env": [
"CACHE_PATH=/cache",
"USER=herokuishuser",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"STACK=heroku-20",
"DEBIAN_FRONTEND=noninteractive"
],
"Cmd": [
"/bin/sh",
"-c",
"#(nop) ",
"LABEL org.label-schema.vendor=dokku"
],
"Image": "sha256:832c50abf474930c7cee20461729e54fc30280bf55f9b0f3a40136637a2bbfe1",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": null,
"OnBuild": null,
"Labels": {
"com.dokku.app-name": "optimadeclient",
"com.dokku.image-stage": "release",
"com.gliderlabs.herokuish/stack": "heroku-20",
"dokku": "",
"org.label-schema.schema-version": "1.0",
"org.label-schema.vendor": "dokku"
}
I suspect the new version of dokku use the new tag point to the image, I am curious what this heroku-20
means. Do you have any idea what to look further?
Hi @dou-du
I got following error message when I found optimadeclient is stuck in loading. There is template related information in tracklog.