Open kos-team opened 2 weeks ago
Those errors look like something is wrong in your Kubernetes environment. "Invalid argument" comes when the filesystem is unable to do something (in this case, allocate a segment in the disk). This isn't directly related to cass-operator or management-api as these functions are dependant on your StorageClass / CSI driver / Kubernetes / Linux / filesystem / etc.
Perhaps something as simple as running out of diskspace or defective disk?
I tested 5.0.2 on multiple systems and they all worked fine.
After some debugging, we found out the key root cause is the file system that we are running upon.
We reproduced it on a Kind Kubernetes cluster with the default local-storage CSI driver.
The host OS is a Linux system, but we were running everything on a tmpfs
filesystem.
When we switched the Kind to use normal ext4 file system, 5.0.2 works fine.
We are curious what has been changed in Cassandra 5.0.2 that made it incompatible with the tmpfs file system.
I do not know, but I can make a guess. In 5.0, they introduced the DIRECT_IO as the type for Commitlog instead of mmap as the default if DirectIO is available for that target disk.
I don't think the logic works correctly for tmpfs in this case as it only checks for the blockSize available by creating a stub file. tmpfs probably returns a value that's in the accepted range (> 0), but tmpfs itself does not support DIRECT_IO so the real writes would fail when using that method.
Because as far as I understand tmpfs, it's already in the page cache and DIRECT_IO means bypassing the page cache. So in that sense, I wonder where it would end up.
You might get tmpfs working if you manually set the commitlog diskaccess mode to mmap or standard (with caveats of course to perf).
What happened?
The latest
cass-operator
with version1.22.4
cannot deploy Cassandra with version 5.0.2 correctly. From the https://github.com/k8ssandra/management-api-for-apache-cassandra repo, 5.0.2 is supported. The Cassandra process crashes with error message:ERROR [COMMIT-LOG-ALLOCATOR] 2024-11-07 21:35:29,362 JVMStabilityInspector.java:201 - Exiting due to error while processing commit log during initialization.
What did you expect to happen?
cass-operator should be able to deploy Cassandra with 5.0.2.
How can we reproduce it (as minimally and precisely as possible)?
This bug can be reproduced by first deploying the cass-operator.
Deploy this CR with the
serverVersion
set to5.0.2
:cass-operator version
1.22.4
Kubernetes version
1.29.1
Method of installation
Helm
Anything else we need to know?
Error log from the
server-system-logger
container, which is the log from the Cassandra itself┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: CASS-77