Open jitingxu1 opened 6 hours ago
did some checks, trino and impala insert data into database one by one, see trino https://github.com/ibis-project/ibis/blob/main/ibis/backends/trino/__init__.py#L592
It becomes slower and slower during insertion...so it takes almost forever to insert the diamond
data, which has 40k rows.
I tested insert data by chunk, ~it works better~
data = list(op.data.to_frame().itertuples(index=False))
insert_stmt = self._build_insert_template(name, schema=schema)
with self.begin() as cur:
cur.execute(create_stmt)
chunk_size = 100 # Define the chunk size
for i in range(0, len(data), chunk_size):
chunk = data[i:i + chunk_size]
cur.executemany(insert_stmt, chunk)
not sure if we want to insert the data as a whole, it may be out of memory if data size is too large
-----update-----
Iteration 100 took 2.7026 seconds
Iteration 101 took 2.7528 seconds
Iteration 102 took 2.8886 seconds
Iteration 103 took 2.7334 seconds
Iteration 104 took 2.5021 seconds
Iteration 105 took 2.6534 seconds
Iteration 106 took 3.8154 seconds
Iteration 107 took 3.4111 seconds
Iteration 108 took 4.0579 seconds
Iteration 109 took 49.1745 seconds
Iteration 110 took 109.2089 seconds
Iteration 111 took 74.9319 seconds
What happened?
code to reproduce the error:
~It throws Exception because of
MEMORY_LIMIT_EXCEEDED
~~It is related to the
_in_memory_table_exists
, I saw we have recently changed the implementation #10067~smaller data runs OK
I guess this could be the reason of CI failures for #9908 #9744
What version of ibis are you using?
9.5.0
What backend(s) are you using, if any?
Trino
Relevant log output
Code of Conduct