matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.79k stars 276 forks source link

[Bug]: Insert into Table with PK very slow #18243

Closed arjunsk closed 3 days ago

arjunsk commented 2 months ago

Is there an existing issue for the same bug?

Branch Name

main

Commit ID

91cec0940fb83d9de5d65d4cf33f1863b4340dfa

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

The insert performance with the PK column is very slow. Sometimes it takes more than 20 minutes to insert.

Expected Behavior

The insert should be complete in under 6 minutes.

Result: inserted=1000000 insert/second=1841.1896048008125 duration=543.1271159648895

Steps to Reproduce

  1. Run MO
  2. Run the python code
import time

from sqlalchemy import create_engine, text
from sqlalchemy.orm import sessionmaker

table_name = "speedtest"
num_inserts = 1000_000
max_duration_seconds = 12 * 60

def run():
    engine = create_engine("mysql+mysqldb://root:111@127.0.0.1:6001/")
    Session = sessionmaker(bind=engine)
    session = Session()

    session.execute(text("DROP DATABASE IF EXISTS vecdb;"))
    session.commit()

    session.execute(text("CREATE DATABASE vecdb;"))
    session.execute(text("USE vecdb;"))
    session.execute(text("DROP TABLE IF EXISTS " + table_name + ";"))
    # session.execute(text("CREATE TABLE " + table_name + "(id INT);")) # this is fast
    session.execute(text("CREATE TABLE " + table_name + "(id INT primary key);"))  # this is slow
    session.commit()

    start_time = time.time()
    sql_insert = text("INSERT INTO " + table_name + " (id) VALUES (:id);")
    for i in range(num_inserts):
        current_time = time.time()
        if current_time - start_time > max_duration_seconds:
            raise RuntimeError("Execution time exceeded 3 minutes. Panic and abort!")

        session.execute(sql_insert, {"id": i})

        if i % 1000 == 0:
            print(f"inserted {i} rows")
    session.commit()

    duration = time.time() - start_time
    print(f"Result: "
          f"inserted={num_inserts} "
          f"insert/second={num_inserts / duration} "
          f"duration={duration}")

def main():
    try:
        run()
    except RuntimeError as e:
        print(e)

if __name__ == "__main__":
    main()

Additional information

No response

jensenojs commented 2 months ago
image

As time goes by, the time-consuming cost of gc gradually becomes the main contradiction.

image

alloc object :

image

heap_cpu.zip

jensenojs commented 2 months ago

和这个是类似的, 都是反映single insert的性能慢的问题

jensenojs commented 4 days ago

no process

ouyuanning commented 3 days ago

insert性能的有issue跟踪了。这个先关闭吧