moka-rs / moka

A high performance concurrent caching library for Rust
Apache License 2.0
1.63k stars 73 forks source link

Memory corruption observed when using Moka v0.9.6 #281

Closed tatsuya6502 closed 1 year ago

tatsuya6502 commented 1 year ago

Split from MaterializeInc/database-issues#93.

Note: I started looking into Moka because recently we tried using it in https://github.com/MaterializeInc/materialize/pull/19614, but had to revert it because we were observing memory corruption.

I am trying to reproduce the issue using mokabench but no luck: https://github.com/moka-rs/moka/issues/279#issuecomment-1596820321

CC: @ParkMyCar

tatsuya6502 commented 1 year ago

Details of the issue: https://github.com/MaterializeInc/database-issues/issues/5879

It seems easy to reproduce if we can run an integration end-to-end test called "Long Zippy w/ user tables".

This happened twice in a row in long Zippy w/ user tables test: https://buildkite.com/materialize/release-qualification/builds/197#018883bf-5086-4a43-a8b4-ede1ef192de5

tatsuya6502 commented 1 year ago

https://github.com/MaterializeInc/materialize/blob/bfbebe04a7d181f1bac91f01785309a52cf2de34/ci/release-qualification/pipeline.yml#L99-L108

  - id: zippy-user-tables-large
    label: "Long Zippy w/ user tables"
    timeout_in_minutes: 2880
    agents:
      queue: linux-x86_64-large
    artifact_paths: junit_*.xml
    plugins:
      - ./ci/plugins/mzcompose:
          composition: zippy
          args: [--scenario=UserTablesLarge, --actions=200000]
tatsuya6502 commented 1 year ago

I ran a short version of "Zippy w/ user tables test" on Linux x86_64. I used the same scenario, but smaller number of actions 2,000 instead of 200,000.

I could not reproduce the issue. I will try a longer run.

Environment:

Terminal Log:

$ git clone git@github.com:MaterializeInc/materialize.git
$ cd materialize

$ git branch moka-segfault 07386e427f
$ git checkout moka-segfault

$ rg moka
Cargo.lock
2903: "moka",
3241:name = "moka"
4380: "moka",
8629: "moka",

src/persist-client/Cargo.toml
41:moka = { version = "0.9.6", default-features = false, features = ["sync"] }

src/persist-client/src/internal/cache.rs
16:use moka::notification::RemovalCause;
17:use moka::sync::Cache;
100:        // This could maybe use moka's async cache to unify any concurrent

src/workspace-hack/Cargo.toml
63:moka = { version = "0.9.6", default-features = false, features = ["atomic64", "sync"] }
164:moka = { version = "0.9.6", default-features = false, features = ["atomic64", "sync"] }
$ ./bin/mzcompose --find zippy run default --scenario UserTablesLarge --actions 2000
==> Collecting mzbuild images
materialize/ubuntu-base:mzbuild-WBP7JFLWFQYUIHLXR5JSOAJ74VVKI6O7
materialize/clusterd:mzbuild-Q77JA6BIOATFUZAWANSIJFOCUXV3GOGX
materialize/materialized:mzbuild-IIUOBQEI7JVK3YSGYYV6FP3EP3JGFF5I
materialize/test-certs:mzbuild-6JWZ2MGLSYOCL3MLHIEIJMMLQLEOLCNY
materialize/postgres:mzbuild-ENTVWBFMO762DEEKAMURDMVMQ52GIO63
materialize/testdrive:mzbuild-ANAR5HEUWWDMTQUJ3ZA7XJWEY64H2ZTB
...
==> Running test case workflow-default
==> Running workflow default
...
[+] Running 3/3
 ✔ Container zippy-cockroach-1     Healthy
 ✔ Container zippy-minio-1         Healthy
 ✔ Container zippy-materialized-1  Healthy
> ALTER SYSTEM SET enable_unmanaged_cluster_replicas = true;
> CREATE CLUSTER storaged REPLICAS (r2 (
                STORAGECTL ADDRESSES ['storaged:2100'],
                STORAGE ADDRESSES ['storaged:2103'],
                COMPUTECTL ADDRESSES ['storaged:2101'],
                COMPUTE ADDRESSES ['storaged:2102'],
                WORKERS 4
            ))
[+] Stopping 1/1
 ✔ Container zippy-materialized-1  Stopped
Going to remove zippy-materialized-1
[+] Removing 1/0
 ✔ Container zippy-materialized-1  Removed
[+] Building 0.0s (0/0)
[+] Running 1/1
 ✔ Container zippy-testdrive-1  Healthy
Generating test...
Running test...
--- MaterializeInc/database-issues#1: KafkaStart
[+] Building 0.0s (0/0)
[+] Running 2/2
 ✔ Container zippy-zookeeper-1  Healthy
 ✔ Container zippy-kafka-1      Healthy
--- MaterializeInc/database-issues#2: CockroachStart
[+] Building 0.0s (0/0)
[+] Running 1/1
 ✔ Container zippy-cockroach-1  Healthy
--- MaterializeInc/database-issues#3: MinioStart
[+] Building 0.0s (0/0)
[+] Running 1/1
 ✔ Container zippy-minio-1  Healthy
--- MaterializeInc/database-issues#4: MzStart
[+] Building 0.0s (0/0)
[+] Running 3/3
 ✔ Container zippy-cockroach-1     Healthy
 ✔ Container zippy-minio-1         Healthy
 ✔ Container zippy-materialized-1  Healthy
--- MaterializeInc/materialize#5: StoragedStart
[+] Building 0.0s (0/0)
[+] Running 1/1
 ✔ Container zippy-storaged-1  Healthy
--- MaterializeInc/database-issues#5: CreateTable
> CREATE TABLE table_0 (f1 INTEGER);
rows match; continuing at ts 1687305573.2167296
> INSERT INTO table_0 VALUES (0);
rows match; continuing at ts 1687305573.2269592

...

> INSERT INTO table_0 SELECT * FROM generate_series(568924, 572569);
rows match; continuing at ts 1687306589.1175394
==> mzcompose: test case workflow-default succeeded
tatsuya6502 commented 1 year ago

There are almost 300 commits between when moka was added and when removed: https://github.com/MaterializeInc/materialize/compare/1bcd1c4b89...07386e427f

I need to know when (which commit) last failed.

tatsuya6502 commented 1 year ago

There are almost 300 commits between when moka was added and when removed: https://github.com/MaterializeInc/materialize/compare/1bcd1c4b89...07386e427f

I need to know when (which commit) last failed.

The following commits were mentioned in https://github.com/MaterializeInc/database-issues/issues/5879.

tatsuya6502 commented 1 year ago

Ran the test (with 5,000 actions) several times on my home PC, but I have not reproduced the issue.

Environment:

  • Docker (x86_64)
  • Intel Core i7-12700F
    • 8 × performance cores × 2 hyper threads
    • 4 × efficient cores

My PC (Windows 11 + WSL2) does not have enough RAM to run longer Zippy test. It has only 16GB of RAM and Docker containers like Kafka sometimes get killed while running test. I will expand the RAM to 32GB.

I hope running longer test would help to reproduce the issue. But another aspect could be the number of available cores/hyper threads. My PC has 16 cores and 20 threads, but their EC2 instance only has 8 vCPUs:

I will probably adjust the number of hyper threads available for Docker:

Also, I will add some tracing logs to both Moka and Materialize to get better understandings on the access pattern.

0xdeafbeef commented 1 year ago

@tatsuya6502 I have 32 threads / 128 ram and can leave test running for night if this can help

tatsuya6502 commented 1 year ago

@0xdeafbeef Thanks for the offer. Wow, you have a great machine!

I think I am going to be okay. I already ordered the RAM modules and am waiting for them to arrive. I think 32GB should be enough for now.

tatsuya6502 commented 1 year ago

FYI, I got the RAM modules for my PC, and confirmed it can run the test with at least 50,000 actions. I allocated 24GiB of memory and only 8 logical CPU cores to WSL2.

The reason I did not try 200,000 actions is that it will take long time (Estimated ~30 hours). 50,000 actions took 7 hours and 14 minutes. The memory utilization was fine so I had feeling that it could run 200,000 without issue.

Also, before I got the RAM module, I was also able to run 80,000 actions. I found that reducing the number of logical CPU cores for WSL2/Docker will reduce the memory utilization of the test. So I allocated only 8 cores (instead of 20) and 10GiB of memory (instead of 8GiB) to WSL2 and it was able to run 80,000 actions.

tatsuya6502 commented 1 year ago

I talked to a Materialize engineer. They will try to reproduce the issue using an EC2 instance and give me the full set of instructions.

https://github.com/MaterializeInc/materialize/discussions/20136#discussioncomment-6274272

Hopefully someone from the QA team has time to attempt the repro early next week and give you the full set of instructions.

tatsuya6502 commented 1 year ago

With some helps from Materialize engineers, I made some progress. I reproduced the segmentation fault in Moka v0.9.6 in Materialize, and got a core dump to analyze.

NOTES: I had to use an Amazon EC2 c6a.12xlarge instance, which has 48 × vCPUs. This is because using my home PC (20 × HyperThreads) can take few days to reproduce with the current test (details). Also it was a bit tricky to get core dump in a Docker container, so it took some time to me to figure out. (An modified Materialize test harness for getting core dumps)

Here is the back trace of the crashed thread:

moka-segfault-2023-07-01

It crashed in the Deque::move_to_back method (src/common/deque.rs#L202), which was called by the base_cache::Inner::handle_upsert method (src/sync_base/base_cache.rs#L1414).

Materialize uses Moka v0.9.6, and this problem may have been fixed by #259 in v0.11.0.

I will back-port the fix to v0.9.x, and run the same test to see if the problem goes away.

Idea of steps to reproduce

If this is related to #259, reproducing steps could be the followings. However the end result depends largely on how the compiler optimizes reading an immutable reference & to a LRU double-ended queue (deque).

  1. Create a cache with weighted capacity 6.

  2. Stop the periodic sync.

  3. Get not-existing key X once. (popularity = 1)

  4. Get not-existing key Y once. (popularity = 1)

  5. Insert key X with weight 2.

  6. Insert key Y with weight 2.

  7. Call sync:

    • All X and Y are admitted.
    • Now the LRU deque has: (front) XY (back)
  8. Get not-existing key V three times. (popularity = 3)

  9. Insert V with weight 3.

  10. Invalidate Y.

    • This removes Y from the concurrent hash table (cht), but removing it’s node from the deque will be delayed to a future sync.
  11. Call sync. The followings will happen and the last one can cause a segmentation fault:

    • X and Y are selected as the victims (entries to evict), and V is marked as admitted.
      • weights: X + Y = 4, V = 3
      • popularity: X + Y = 2, V = 3
    • X is removed from the cht, and its deque node is unlinked and dropped.
      • Unlinking X node changes Y node's prev pointer from Some(NonNull(X)) to None .
    • Because of step 10, Y is no longer in the cht, so Y's deque node is added to the skipped_nodes vector.
    • V's deque node is created and pushed to the back of the deque.
    • The nodes in the skipped_nodes are processed. (There is only Y node): Y node will be moved to the back of the deque.
      • Deque's move_to_back method checks Y's prev pointer, which is None now.
      • However, because of the bug (fixed by #259 for v0.11.0), optimized machine instructions may use an stale value of the prev pointer in a CPU register (pointing to X), instead of reloading the actual value None from the memory.
      • X was already dropped, so trying to access X (in the following code: *prev.as_ptr()) will cause the segmentation fault.

src/common/deque.rs#L202

188: pub(crate) unsafe fn move_to_back(&mut self, mut node: NonNull<DeqNode<T>>) {
        ...
201:    match node.prev {
202:        Some(prev) if node.next.is_some() => (*prev.as_ptr()).next = node.next,
tatsuya6502 commented 1 year ago
  1. Invalidate Y.
    • This removes Y from the concurrent hash table (cht), but removing it’s node from the deque will be delayed to a future sync.
  2. Call sync. The followings will happen and the last one can cause a segmentation fault:
    • X and Y are selected as the victims (entries to evict), and V is marked as admitted.
      • weights: X + Y = 4, V = 3
      • popularity: X + Y = 2, V = 3
    • X is removed from the cht, and its deque node is unlinked and dropped.
      • Unlinking X node changes Y node's prev pointer from Some(NonNull(X)) to None .
    • Because of step 10, Y is no longer in the cht, so Y's deque node is added to the skipped_nodes vector.

I reviewed code and realized that above will not reproduce the problem.

tatsuya6502 commented 1 year ago

Materialize uses Moka v0.9.6, and this problem may have been fixed by #259 in v0.11.0.

  • MaterializeInc/materialize#259

I will back-port the fix to v0.9.x, and run the same test to see if the problem goes away.

I backported the fix to v0.9.x:

I am running "Long Zippy w/ user tables" on an EC2 c6a.12xlarge instance (48 vCPUs). It has been running for 13 hours 14 hours without reproducing the problem. This looks promising.

I will release Moka v0.9.8 with MaterializeInc/materialize#283 after I finish running pre-release tests. And then I will let the Materialize engineers know that the problem may have been fixed.

tatsuya6502 commented 1 year ago

I will backport MaterializeInc/materialize#259 to Moka v0.10.x as well. I will keep this issue open until I am done with it.

tatsuya6502 commented 1 year ago

Closing as completed.

Moka v0.9.x

Moka v0.10.x