pingcap / tiflow

This repo maintains DM (a data migration platform) and TiCDC (change data capture for TiDB)
Apache License 2.0
424 stars 282 forks source link

CDC OOM when sysbench prepare 1000 table (100000 rows) when redo is on #10255

Open fubinzh opened 10 months ago

fubinzh commented 10 months ago

What did you do?

  1. TiDB cluster with 3 CDC (2C 16G)
  2. Create mysql sink changefeed with redo on
    
    bash-5.1# /cdc cli changefeed --server http://127.0.0.1:8301 query -c redo-basic-s3-partition
    {
    "upstream_id": 7309132325626639995,
    "namespace": "default",
    "id": "redo-basic-s3-partition",
    "sink_uri": "mysql://root:xxxxx@downstream.cdc-redo-related-tps-5100030-1-871:3306",
    "config": {
    "memory_quota": 1073741824,
    "case_sensitive": false,
    "enable_old_value": true,
    "force_replicate": false,
    "ignore_ineligible_table": false,
    "check_gc_safe_point": true,
    "enable_sync_point": false,
    "bdr_mode": false,
    "sync_point_interval": 600000000000,
    "sync_point_retention": 86400000000000,
    "filter": {
      "rules": [
        "*.*"
      ],
      "event_filters": null
    },
    "mounter": {
      "worker_num": 16
    },
    "sink": {
      "protocol": "",
      "schema_registry": "",
      "csv": {
        "delimiter": ",",
        "quote": "\"",
        "null": "\\N",
        "include_commit_ts": false,
        "binary_encoding_method": "base64"
      },
      "column_selectors": null,
      "transaction_atomicity": "",
      "encoder_concurrency": 16,
      "terminator": "\r\n",
      "date_separator": "day",
      "enable_partition_separator": true,
      "file_index_width": 0,
      "kafka_config": null,
      "advance_timeout": 150
    },
    "consistent": {
      "level": "eventual",
      "max_log_size": 64,
      "flush_interval": 2000,
      "meta_flush_interval": 200,
      "encoding_worker_num": 16,
      "flush_worker_num": 8,
      "storage": "s3://tmp/test-infra-redolog/redo-basic-s3-partition-6950af25-1300-41dd-97f3-01e06c879c66?access-key=xxz\u0026secret-access-key=xxx\u0026endpoint=http://xxx:9000\u0026force-path-style=true",
      "use_file_backend": false
    },
    "changefeed_error_stuck_duration": 1800000000000,
    "sql_mode": "ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
    },
    "create_time": "2023-12-05 15:28:33.096",
    "start_ts": 446114067381223436,
    "resolved_ts": 446115244722356227,
    "target_ts": 0,
    "checkpoint_tso": 446115243358946105,
    "checkpoint_time": "2023-12-05 16:43:18.995",
    "sort_engine": "unified",
    "state": "failed",
    "error": null,
    "error_history": null,
    "creator_version": "v6.5.6-pr10254"
    }
4. run workload

sysbench --db-driver=mysql --mysql-host=xxx --mysql-port=4000 --mysql-user=root --mysql-db=workload --tables=1000 --table-size=100000 --create_secondary=off --debug=true --threads=10 --mysql-ignore-errors=2013,1213,1105,1205,8022,8027,8028,9004,9007,1062 oltp_write_only prepare



### What did you expect to see?

CDC should not OOM

### What did you see instead?

CDC OOM.

![image](https://github.com/pingcap/tiflow/assets/7403864/8f158ce6-021b-4205-9bae-12ba98a583ac)

### Versions of the cluster

CDC version:
[release-version=v6.5.6] [git-hash=067bb8031fd2d15763b464052a1e550a81af2196] 
fubinzh commented 10 months ago

/label affects-6.5

fubinzh commented 10 months ago

/severity major

zhangjinpeng87 commented 10 months ago

@fubinzh These 3 TiCDC nodes deployed in one node or separate nodes?

nongfushanquan commented 9 months ago

/assign @sdojjy

flowbehappy commented 5 months ago

It should be an enhancement, instead of bug.