kata-containers / runtime

Kata Containers version 1.x runtime (for version 2.x see https://github.com/kata-containers/kata-containers).
https://katacontainers.io/
Apache License 2.0
2.1k stars 375 forks source link

cassandra initialization hangs #537

Closed egernst closed 3 years ago

egernst commented 6 years ago

Once cassandra is started, you should expect port 9042 to be listened on for cqlsh. When booting with kata, this doesn't happen because initialization is hung at:

WARN  [main] 2018-07-31 03:12:09,728 StartupChecks.java:311 - Maximum number of memory map areas per process (vm.max_map_count) 65530 is too low, recommended value: 1048575, you can change it with sysctl.
WARN  [main] 2018-07-31 03:12:09,946 StartupChecks.java:332 - Directory /var/lib/cassandra/data doesn't exist
WARN  [main] 2018-07-31 03:12:10,025 StartupChecks.java:332 - Directory /var/lib/cassandra/commitlog doesn't exist
WARN  [main] 2018-07-31 03:12:10,040 StartupChecks.java:332 - Directory /var/lib/cassandra/saved_caches doesn't exist
WARN  [main] 2018-07-31 03:12:10,058 StartupChecks.java:332 - Directory /var/lib/cassandra/hints doesn't exist
INFO  [main] 2018-07-31 03:12:11,052 QueryProcessor.java:116 - Initialized prepared statement caches with 10 MB (native) and 10 MB (Thrift)
INFO  [main] 2018-07-31 03:12:26,213 ColumnFamilyStore.java:411 - Initializing system.IndexInfo

This can be observed at /var/log/cassandra/debug.log

To reproduce, run docker run --name cassandra -p 9042:9042 -p 9160:9160 -d cassandra and either try to connect via cqlsh or check if port 9042 is use in the container or check the logs at /var/log/cassandra/debug.log

For testing cqlsh:

server container (failing today): docker run --name cassandra -p 9042:9042 -p 9160:9160 -d cassandra

client:

docker run -it cassandra:latest  bash -c 'cqlsh <host ip for system running the server>'

Connected to Test Cluster at <ip for host system running the server>:9042.
[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> 
egernst commented 6 years ago

/cc @bergwolf @amshinde

jodh-intel commented 6 years ago

Looks like we also need to:

bergwolf commented 6 years ago

The cassandra image has VOLUME [/var/lib/cassandra]. The system.log showed that

WARN  [main] 2018-07-31 03:28:03,128 StartupChecks.java:332 - Directory /var/lib/cassandra/data doesn't exist
WARN  [main] 2018-07-31 03:28:03,214 StartupChecks.java:332 - Directory /var/lib/cassandra/commitlog doesn't exist
WARN  [main] 2018-07-31 03:28:03,229 StartupChecks.java:332 - Directory /var/lib/cassandra/saved_caches doesn't exist
WARN  [main] 2018-07-31 03:28:03,247 StartupChecks.java:332 - Directory /var/lib/cassandra/hints doesn't exist

And then if exec into the container, we can see these directories are there.

grahamwhaley commented 6 years ago

An ls -la of those dirs and a mount within the system might provide some more clues. My gut of course points at 9p, hence we should check the mounts to see where those paths point. My ultimate route for such things is then an strace on the binary that is complaining to see if we can pick out the exact syscall and matching error.

egernst commented 6 years ago

I tested and saw failure when backed by devicemapper. Will seek to get more details.

@jodh-intel - I am not sure if that’ll catch it. The container comes up without error. It’s when you investigate or try to connect to it that a failure occurs.

jodh-intel commented 6 years ago

@egernst - sure, I meant update that test to perform a more thorough cassandra test.

egernst commented 6 years ago

@jodh-intel agreed - definitely necessary. First we need to make it pass locally I guess.

Looked at it with @amshinde - by default it'll make a docker volume for /var/lib/cassandra, which'll make use of 9pfs. We are working to provide guidance on passing in a database as a block device now and verifying this will work. Outcome of this issue could be a .md describing cassandra + kata.

aayush-ag21 commented 4 years ago

I was facing a similar issue when I tried to volume map Cassandra data to an external volume. I am using the Windows toolkit. It'll work fine on switching to an internal volume.