Open Oliver-Sellwood opened 3 years ago
Thanks @Oliver-Sellwood!
We have fixed a few issues that could explain what you are observing. Have you built Sysbox from source or are you running the latest release (v0.3.0)? If you are running v0.3.0, please build Sysbox from source (very easy process as it is fully containerized) and try again.
Also, keep in mind that we are in the testing phase of the next Sysbox-CE release (v0.4.0) which is expected to come out within the next few hours / days.
Yea, I'm on 0.3.0
, I don't have time to test from source right now. I will update when I've had a chance (or just install 0.4.0
if it is already out when I get around)
within the next few hours / days.
Nice, gratz!
@Oliver-Sellwood, just to let you know that Sysbox v0.4.0 is available now.
Thanks :)
Now I get these;
Starting es03 ... erroress_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 8: operation not permitted: unknown
ERROR: for es03 Cannot start service es03: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: procStarting es02 ... error
ERROR: for es02 Cannot start service es02: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 8: operation not permitted: unknown
ERROR: for es01 Cannot start service es01: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 8: operation not permitted: unknown
ERROR: for es03 Cannot start service es03: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 8: operation not permitted: unknown
ERROR: for es02 Cannot start service es02: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 8: operation not permitted: unknown
ERROR: Encountered errors while bringing up the project.`
and in the dockerd logs
INFO[2021-07-18T01:07:40.055740278Z] shim disconnected id=2c2d1a32e0edceca97c54b09980b9f6af699d73799d4610af9d2c4ca826b1841
ERRO[2021-07-18T01:07:40.055868856Z] copy shim log error="read /proc/self/fd/14: file already closed"
ERRO[2021-07-18T01:07:40.058000286Z] stream copy error: reading from a closed fifo
ERRO[2021-07-18T01:07:40.058056997Z] stream copy error: reading from a closed fifo
INFO[2021-07-18T01:07:40.302366347Z] shim disconnected id=d14b01fa53a0a5882418cce7a4c804db945453dc10862d02f7df19a7ce332eb8
ERRO[2021-07-18T01:07:40.302423967Z] copy shim log error="read /proc/self/fd/18: file already closed"
ERRO[2021-07-18T01:07:40.311180231Z] stream copy error: reading from a closed fifo
ERRO[2021-07-18T01:07:40.311201742Z] stream copy error: reading from a closed fifo
INFO[2021-07-18T01:07:40.324172533Z] shim disconnected id=b96fbd96eb5538ad1c3cb6a815d1b31a25b77cc234c345fdac37bee2cecbefd1
ERRO[2021-07-18T01:07:40.324311518Z] copy shim log error="read /proc/self/fd/12: file already closed"
ERRO[2021-07-18T01:07:40.326996932Z] stream copy error: reading from a closed fifo
ERRO[2021-07-18T01:07:40.327002240Z] stream copy error: reading from a closed fifo
ERRO[2021-07-18T01:07:40.537804544Z] 2c2d1a32e0edceca97c54b09980b9f6af699d73799d4610af9d2c4ca826b1841 cleanup: failed to delete container from containerd: no such container
ERRO[2021-07-18T01:07:40.537881021Z] Handler for POST /v1.41/containers/2c2d1a32e0edceca97c54b09980b9f6af699d73799d4610af9d2c4ca826b1841/start returned error: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 8: operation not permitted: unknown
ERRO[2021-07-18T01:07:40.684948301Z] b96fbd96eb5538ad1c3cb6a815d1b31a25b77cc234c345fdac37bee2cecbefd1 cleanup: failed to delete container from containerd: no such container
ERRO[2021-07-18T01:07:40.685018981Z] Handler for POST /v1.41/containers/b96fbd96eb5538ad1c3cb6a815d1b31a25b77cc234c345fdac37bee2cecbefd1/start returned error: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 8: operation not permitted: unknown
ERRO[2021-07-18T01:07:40.789741377Z] d14b01fa53a0a5882418cce7a4c804db945453dc10862d02f7df19a7ce332eb8 cleanup: failed to delete container from containerd: no such container
ERRO[2021-07-18T01:07:40.789815199Z] Handler for POST /v1.41/containers/d14b01fa53a0a5882418cce7a4c804db945453dc10862d02f7df19a7ce332eb8/start returned error: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:446: setting rlimits for ready process caused: error setting rlimit type 8: operation not permitted: unknown
I also tried with increased memlock limits, but still no luck. i.e
docker run -dit -v `pwd`/test:/opt --ulimit memlock=8192000000:8192000000 golang bash
Env setup (inside the container)
curl -fsSL https://get.docker.com |sh ## yolo
curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
cd /opt
/usr/local/bin/docker-compose up
and the docker-compose.yml
is
version: '2.2'
services:
es01:
image: docker.elastic.co/elasticsearch/elasticsearch:7.13.3
container_name: es01
environment:
- node.name=es01
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es02,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- data01:/usr/share/elasticsearch/data
ports:
- 9201:9200
networks:
- elastic
es02:
image: docker.elastic.co/elasticsearch/elasticsearch:7.13.3
container_name: es02
environment:
- node.name=es02
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- data02:/usr/share/elasticsearch/data
networks:
- elastic
es03:
image: docker.elastic.co/elasticsearch/elasticsearch:7.13.3
container_name: es03
environment:
- node.name=es03
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es02
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- data03:/usr/share/elasticsearch/data
networks:
- elastic
kib01:
image: docker.elastic.co/kibana/kibana:7.13.3
container_name: kib01
ports:
- 5601:5601
environment:
ELASTICSEARCH_URL: http://es01:9200
ELASTICSEARCH_HOSTS: '["http://es01:9200","http://es02:9200","http://es03:9200"]'
networks:
- elastic
volumes:
data01:
driver: local
data02:
driver: local
data03:
driver: local
networks:
elastic:
driver: bridge
@Oliver-Sellwood, this time you ran into something very different than before. I was able to reproduce the issue following your description, very helpful btw ...
The bad news is that we are dealing with two different issues. The good one is that i believe that I have potential fixes for both of them in my mind. Also, if you're willing to make a couple of minor adjustments, i have a simple workaround that you can apply (see further below).
We are dealing with two different issues here, and they are both a consequence of the special requirements imposed by Elastic-Search during initialization. As you can see below, during problem reproduction, Elastic complains of not enough resources being allocated for its operation:
es02 | ERROR: [2] bootstrap checks failed. You must address the points described in the following [2] lines before starting Elasticsearch.
es02 | bootstrap check failure [1] of [2]: memory locking requested for elasticsearch process but memory is not locked
es02 | bootstrap check failure [2] of [2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
es02 | ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/es-docker-cluster.log
es02 | {"type": "server", "timestamp": "2021-07-18T06:14:20,669Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "es-docker-cluster", "node.name": "es02", "message": "stopping ..." }
es01 | ERROR: [2] bootstrap checks failed. You must address the points described in the following [2] lines before starting Elasticsearch.
es01 | bootstrap check failure [1] of [2]: memory locking requested for elasticsearch process but memory is not locked
es01 | bootstrap check failure [2] of [2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
es01 | ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/es-docker-cluster.log
es01 | {"type": "server", "timestamp": "2021-07-18T06:14:20,690Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "es-docker-cluster", "node.name": "es01", "message": "stopping ..." }
es03 | ERROR: [2] bootstrap checks failed. You must address the points described in the following [2] lines before starting Elasticsearch.
es03 | bootstrap check failure [1] of [2]: memory locking requested for elasticsearch process but memory is not locked
es03 | bootstrap check failure [2] of [2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
es03 | ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/es-docker-cluster.log
The problem is that, as can be seen above, none the system-resource adjustments (limit increments) attempted during the Elastic initialization process succeeded. There are two separate issues that need resolution here:
We should allow root users within a sys-container to modify this node: /proc/sys/vm/max_map_count
, which corresponds to the sysctl observed above (vm.max_map_count
). Right now this resource is only readable, hence the error message above.
EPERM during setrlimit() execution: For this second issue I see two approaches that we could explore. We can either offer full flexibility by allowing the execution of setrlimit() syscall through seccomp-based trapping logic, OR, we offer a more constrained (but perhaps more secure) approach in which the user must define ahead of time the limits to be associated to all the processes within a sys-container (e.g. through a config-file) -- obviously, this limit would need to be large enough to satisfy all the app requirements within a sys-container.
Notice that currently only soft-limits
can be modified within a sys-container; apparently, hard-limits
cannot be changed outside the non-init user-namespace:
root@test-1:/# cat /proc/self/limits
Limit Soft Limit Hard Limit Units
...
Max pending signals 31614 31614 signals
...
root@test-1:/#
root@test-1:/# ulimit -a
...
pending signals (-i) 31614
...
root@test-1:/# ulimit -i 31615
bash: ulimit: pending signals: cannot modify limit: Operation not permitted
root@test-1:/# ulimit -i 31614
root@test-1:/#
For now, i would suggest to apply this simple workaround:
vm.max_map_count
sysctl at host level:$ sudo sysctl -w vm.max_map_count=262144
vm.max_map_count = 262144
$ docker run --runtime=sysbox-runc -it -v `pwd`/test:/opt --ulimit memlock=8192000000:8192000000 golang bash
You should be able to bring up all the containers that are part of the docker-compose stack now.
I was having a similar issue using nestybox/gitlab-runner-docker Dockerfile when building locally with newer docker and gitlab-runner versions. Installing sysbox-ce 0.4.0 fixed my problem. Awesome job!
Will adjust the title of this issue to reflect the new problem discovered by @Oliver-Sellwood. The original issue (dockerd freezes) was already tracked and fixed a few weeks ago as part of various other issues.
Note: Problem was not caught by Sysbox's extended testsuites (where an Elastic-Search testcase is already present), due to the absence in that case of the -e "bootstrap.memory_lock=true" --ulimit memlock=-1:-1 attributes, which eventually end up causing the setrlimit() syscall.
I ran into a similar issue with the following docker-compose:
dns01:
<<: *service
hostname: xyz
container_name: xyz
image: devenv:latest
ulimits:
nofile:
soft: 3009550
hard: 3009550
networks:
boulder:
ipv4_address: 10.4.12.20
Error response from daemon: OCI runtime create failed: container_linux.go:393: starting container process caused: process_linux.go:607: container init caused: process_linux.go:465: setting rlimits for ready process caused: error setting rlimit type 7: operation not permitted: unknown
I don't want to derail the conversation here, but for folks running into issues with Elasticsearch's insistence on memlock
, you can disable this check by setting the environment variable bootstrap.memory_lock=false
in your Elasticsearch container. You will want to make sure you have swap disabled.
Hi, Great project!
I am having issues where for some containers the dockerd inside a container will become unresponsive, restarting the daemon once it has reached this state will print the following and then nothing more.
The only solution I have found to get the daemon responsive again is to restart the container, but the issue pops up fairly frequently (i.e within 3 or 4 container launches).
I am experiencing this most recently in the golang image but have seen it in others. Not sure where to look for more information to help debug, sorry.