Memory corruption observed when using Moka v0.9.6

tatsuya6502 commented 1 year ago

Split from MaterializeInc/database-issues#93.

Note: I started looking into Moka because recently we tried using it in https://github.com/MaterializeInc/materialize/pull/19614, but had to revert it because we were observing memory corruption.

Moka v0.9.6
No stack traces and core dumps are available.
The target platform (CPU arch and OS) ~~are yet unknown.~~ => Linux x86_64
moka::sync::Cache are used in async contexts.
Multiple (many?)Cache instances. (An instance per blob URL. Do not know how many)
Weigher and eviction listener (immediate mode) are configured.

I am trying to reproduce the issue using mokabench but no luck: https://github.com/moka-rs/moka/issues/279#issuecomment-1596820321

CC: @ParkMyCar

tatsuya6502 commented 1 year ago

Details of the issue: https://github.com/MaterializeInc/database-issues/issues/5879

It seems easy to reproduce if we can run an ~~integration~~ end-to-end test called "Long Zippy w/ user tables".

This happened twice in a row in long Zippy w/ user tables test: https://buildkite.com/materialize/release-qualification/builds/197#018883bf-5086-4a43-a8b4-ede1ef192de5

tatsuya6502 commented 1 year ago

https://github.com/MaterializeInc/materialize/blob/bfbebe04a7d181f1bac91f01785309a52cf2de34/ci/release-qualification/pipeline.yml#L99-L108

  - id: zippy-user-tables-large
    label: "Long Zippy w/ user tables"
    timeout_in_minutes: 2880
    agents:
      queue: linux-x86_64-large
    artifact_paths: junit_*.xml
    plugins:
      - ./ci/plugins/mzcompose:
          composition: zippy
          args: [--scenario=UserTablesLarge, --actions=200000]

CI Overview: https://github.com/MaterializeInc/materialize/blob/main/ci/README.md
About Zippy: https://github.com/MaterializeInc/materialize/blob/main/doc/developer/zippy.md

tatsuya6502 commented 1 year ago

I ran a short version of "Zippy w/ user tables test" on Linux x86_64. I used the same scenario, but smaller number of actions 2,000 instead of 200,000.

I could not reproduce the issue. I will try a longer run.

Environment:

Docker (x86_64)
Intel Core i7-12700F
- 8 × performance cores × 2 hyper threads
- 4 × efficient cores

Terminal Log:

$ git clone git@github.com:MaterializeInc/materialize.git
$ cd materialize

$ git branch moka-segfault 07386e427f
$ git checkout moka-segfault

$ rg moka
Cargo.lock
2903: "moka",
3241:name = "moka"
4380: "moka",
8629: "moka",

src/persist-client/Cargo.toml
41:moka = { version = "0.9.6", default-features = false, features = ["sync"] }

src/persist-client/src/internal/cache.rs
16:use moka::notification::RemovalCause;
17:use moka::sync::Cache;
100:        // This could maybe use moka's async cache to unify any concurrent

src/workspace-hack/Cargo.toml
63:moka = { version = "0.9.6", default-features = false, features = ["atomic64", "sync"] }
164:moka = { version = "0.9.6", default-features = false, features = ["atomic64", "sync"] }

$ ./bin/mzcompose --find zippy run default --scenario UserTablesLarge --actions 2000
==> Collecting mzbuild images
materialize/ubuntu-base:mzbuild-WBP7JFLWFQYUIHLXR5JSOAJ74VVKI6O7
materialize/clusterd:mzbuild-Q77JA6BIOATFUZAWANSIJFOCUXV3GOGX
materialize/materialized:mzbuild-IIUOBQEI7JVK3YSGYYV6FP3EP3JGFF5I
materialize/test-certs:mzbuild-6JWZ2MGLSYOCL3MLHIEIJMMLQLEOLCNY
materialize/postgres:mzbuild-ENTVWBFMO762DEEKAMURDMVMQ52GIO63
materialize/testdrive:mzbuild-ANAR5HEUWWDMTQUJ3ZA7XJWEY64H2ZTB
...
==> Running test case workflow-default
==> Running workflow default
...
[+] Running 3/3
 ✔ Container zippy-cockroach-1     Healthy
 ✔ Container zippy-minio-1         Healthy
 ✔ Container zippy-materialized-1  Healthy
> ALTER SYSTEM SET enable_unmanaged_cluster_replicas = true;
> CREATE CLUSTER storaged REPLICAS (r2 (
                STORAGECTL ADDRESSES ['storaged:2100'],
                STORAGE ADDRESSES ['storaged:2103'],
                COMPUTECTL ADDRESSES ['storaged:2101'],
                COMPUTE ADDRESSES ['storaged:2102'],
                WORKERS 4
            ))
[+] Stopping 1/1
 ✔ Container zippy-materialized-1  Stopped
Going to remove zippy-materialized-1
[+] Removing 1/0
 ✔ Container zippy-materialized-1  Removed
[+] Building 0.0s (0/0)
[+] Running 1/1
 ✔ Container zippy-testdrive-1  Healthy
Generating test...
Running test...
--- MaterializeInc/database-issues#1: KafkaStart
[+] Building 0.0s (0/0)
[+] Running 2/2
 ✔ Container zippy-zookeeper-1  Healthy
 ✔ Container zippy-kafka-1      Healthy
--- MaterializeInc/database-issues#2: CockroachStart
[+] Building 0.0s (0/0)
[+] Running 1/1
 ✔ Container zippy-cockroach-1  Healthy
--- MaterializeInc/database-issues#3: MinioStart
[+] Building 0.0s (0/0)
[+] Running 1/1
 ✔ Container zippy-minio-1  Healthy
--- MaterializeInc/database-issues#4: MzStart
[+] Building 0.0s (0/0)
[+] Running 3/3
 ✔ Container zippy-cockroach-1     Healthy
 ✔ Container zippy-minio-1         Healthy
 ✔ Container zippy-materialized-1  Healthy
--- MaterializeInc/materialize#5: StoragedStart
[+] Building 0.0s (0/0)
[+] Running 1/1
 ✔ Container zippy-storaged-1  Healthy
--- MaterializeInc/database-issues#5: CreateTable
> CREATE TABLE table_0 (f1 INTEGER);
rows match; continuing at ts 1687305573.2167296
> INSERT INTO table_0 VALUES (0);
rows match; continuing at ts 1687305573.2269592

...

> INSERT INTO table_0 SELECT * FROM generate_series(568924, 572569);
rows match; continuing at ts 1687306589.1175394
==> mzcompose: test case workflow-default succeeded

tatsuya6502 commented 1 year ago

There are almost 300 commits between when moka was added and when removed: https://github.com/MaterializeInc/materialize/compare/1bcd1c4b89...07386e427f

I need to know when (which commit) last failed.

tatsuya6502 commented 1 year ago

There are almost 300 commits between when moka was added and when removed: https://github.com/MaterializeInc/materialize/compare/1bcd1c4b89...07386e427f

I need to know when (which commit) last failed.

The following commits were mentioned in https://github.com/MaterializeInc/database-issues/issues/5879.

https://github.com/MaterializeInc/materialize/commit/9ab5f833b919f8a5200b3918eca0847f40cc3337 (Jun 4, 2023)
- "Long Zippy w/ user tables" failed.
https://github.com/MaterializeInc/materialize/commit/3deb5a4739c4e67d677068322b52ea75ddf8252e (Jun 13, 2023)
- "testdrive" failed.
- BuildKite log

tatsuya6502 commented 1 year ago

https://github.com/MaterializeInc/materialize/commit/9ab5f833b919f8a5200b3918eca0847f40cc3337 (Jun 4, 2023)

"Long Zippy w/ user tables" failed.

Ran the test (with 5,000 actions) several times on my home PC, but I have not reproduced the issue.

Environment:

Docker (x86_64)

Intel Core i7-12700F

8 × performance cores × 2 hyper threads

4 × efficient cores

My PC (Windows 11 + WSL2) does not have enough RAM to run longer Zippy test. It has only 16GB of RAM and Docker containers like Kafka sometimes get killed while running test. I will expand the RAM to 32GB.

I hope running longer test would help to reproduce the issue. But another aspect could be the number of available cores/hyper threads. My PC has 16 cores and 20 threads, but their EC2 instance only has 8 vCPUs:

c5.2xlarge instances, which have 8 vCPUs (i.e., hyperthreads) and 16GB of memory.

I will probably adjust the number of hyper threads available for Docker:

https://docs.docker.com/desktop/settings/windows/
- In WSL 2 mode, you can configure limits on the memory, CPU, and swap size allocated to the WSL 2 utility VM.

Also, I will add some tracing logs to both Moka and Materialize to get better understandings on the access pattern.

0xdeafbeef commented 1 year ago

@tatsuya6502 I have 32 threads / 128 ram and can leave test running for night if this can help

tatsuya6502 commented 1 year ago

@0xdeafbeef Thanks for the offer. Wow, you have a great machine!

I think I am going to be okay. I already ordered the RAM modules and am waiting for them to arrive. I think 32GB should be enough for now.

tatsuya6502 commented 1 year ago

FYI, I got the RAM modules for my PC, and confirmed it can run the test with at least 50,000 actions. I allocated 24GiB of memory and only 8 logical CPU cores to WSL2.

The reason I did not try 200,000 actions is that it will take long time (Estimated ~30 hours). 50,000 actions took 7 hours and 14 minutes. The memory utilization was fine so I had feeling that it could run 200,000 without issue.

Also, before I got the RAM module, I was also able to run 80,000 actions. I found that reducing the number of logical CPU cores for WSL2/Docker will reduce the memory utilization of the test. So I allocated only 8 cores (instead of 20) and 10GiB of memory (instead of 8GiB) to WSL2 and it was able to run 80,000 actions.

tatsuya6502 commented 1 year ago

I talked to a Materialize engineer. They will try to reproduce the issue using an EC2 instance and give me the full set of instructions.

https://github.com/MaterializeInc/materialize/discussions/20136#discussioncomment-6274272

Hopefully someone from the QA team has time to attempt the repro early next week and give you the full set of instructions.

tatsuya6502 commented 1 year ago

With some helps from Materialize engineers, I made some progress. I reproduced the segmentation fault in Moka v0.9.6 in Materialize, and got a core dump to analyze.

NOTES: I had to use an Amazon EC2 c6a.12xlarge instance, which has 48 × vCPUs. This is because using my home PC (20 × HyperThreads) can take few days to reproduce with the current test (details). Also it was a bit tricky to get core dump in a Docker container, so it took some time to me to figure out. (An modified Materialize test harness for getting core dumps)

Here is the back trace of the crashed thread:

moka-segfault-2023-07-01

It crashed in the Deque::move_to_back method (src/common/deque.rs#L202), which was called by the base_cache::Inner::handle_upsert method (src/sync_base/base_cache.rs#L1414).

Materialize uses Moka v0.9.6, and this problem may have been fixed by #259 in v0.11.0.

MaterializeInc/materialize#259

I will back-port the fix to v0.9.x, and run the same test to see if the problem goes away.

Idea of steps to reproduce

If this is related to #259, reproducing steps could be the followings. However the end result depends largely on how the compiler optimizes reading an immutable reference & to a LRU double-ended queue (deque).

Create a cache with weighted capacity 6.
Stop the periodic sync.
Get not-existing key X once. (popularity = 1)
Get not-existing key Y once. (popularity = 1)
Insert key X with weight 2.
Insert key Y with weight 2.
Call sync:
- All X and Y are admitted.
- Now the LRU deque has: (front) X → Y (back)
Get not-existing key V three times. (popularity = 3)
Insert V with weight 3.
Invalidate Y.
- This removes Y from the concurrent hash table (cht), but removing it’s node from the deque will be delayed to a future sync.
Call sync. The followings will happen and the last one can cause a segmentation fault:
- X and Y are selected as the victims (entries to evict), and V is marked as admitted.
  - weights: X + Y = 4, V = 3
  - popularity: X + Y = 2, V = 3
- X is removed from the cht, and its deque node is unlinked and dropped.
  - Unlinking X node changes Y node's prev pointer from Some(NonNull(X)) to None .
- Because of step 10, Y is no longer in the cht, so Y's deque node is added to the skipped_nodes vector.
- V's deque node is created and pushed to the back of the deque.
- The nodes in the skipped_nodes are processed. (There is only Y node): Y node will be moved to the back of the deque.
  - Deque's move_to_back method checks Y's prev pointer, which is None now.
  - However, because of the bug (fixed by #259 for v0.11.0), optimized machine instructions may use an stale value of the prev pointer in a CPU register (pointing to X), instead of reloading the actual value None from the memory.
  - X was already dropped, so trying to access X (in the following code: *prev.as_ptr()) will cause the segmentation fault.

src/common/deque.rs#L202

188: pub(crate) unsafe fn move_to_back(&mut self, mut node: NonNull<DeqNode<T>>) {
        ...
201:    match node.prev {
202:        Some(prev) if node.next.is_some() => (*prev.as_ptr()).next = node.next,

tatsuya6502 commented 1 year ago

Invalidate Y.

This removes Y from the concurrent hash table (cht), but removing it’s node from the deque will be delayed to a future sync.

Call sync. The followings will happen and the last one can cause a segmentation fault:

X and Y are selected as the victims (entries to evict), and V is marked as admitted.

weights: X + Y = 4, V = 3

popularity: X + Y = 2, V = 3

X is removed from the cht, and its deque node is unlinked and dropped.

Unlinking X node changes Y node's prev pointer from Some(NonNull(X)) to None .

Because of step 10, Y is no longer in the cht, so Y's deque node is added to the skipped_nodes vector.

I reviewed code and realized that above will not reproduce the problem.

Step 10 (invalidate Y) needs to be done in a smaller time window.
It has to be done:
- after the first bullet of 11 (X and Y are selected as ...)
- and before the third bullet of 11 (Because of step 10, Y is no longer in the cht, ...).

tatsuya6502 commented 1 year ago

Materialize uses Moka v0.9.6, and this problem may have been fixed by #259 in v0.11.0.

MaterializeInc/materialize#259

I will back-port the fix to v0.9.x, and run the same test to see if the problem goes away.

I backported the fix to v0.9.x:

MaterializeInc/materialize#283

I am running "Long Zippy w/ user tables" on an EC2 c6a.12xlarge instance (48 vCPUs). It has been running for ~~13 hours~~ 14 hours without reproducing the problem. This looks promising.

I will release Moka v0.9.8 with MaterializeInc/materialize#283 after I finish running pre-release tests. And then I will let the Materialize engineers know that the problem may have been fixed.

tatsuya6502 commented 1 year ago

I will backport MaterializeInc/materialize#259 to Moka v0.10.x as well. I will keep this issue open until I am done with it.

tatsuya6502 commented 1 year ago

Closing as completed.

Moka v0.9.x

Backported #259 to Moka v0.9.8 via MaterializeInc/materialize#283.
Published Moka v0.9.8 to crates.io.

Moka v0.10.x

Backported #259 to Moka v0.10.3 via MaterializeInc/materialize#286.
Published Moka v0.10.3 to crates.io.

moka-rs / moka

Memory corruption observed when using Moka v0.9.6 #281

Idea of steps to reproduce