Open treibholzhq opened 6 months ago
Hi
due to the scope of the transaction, at some point Yugabyte needs to have a full copy of the data in memory
It should not. In the SQL layer the memory used is per batch of rows. In the storage layer, it is per RocksDB memtable.
consumes excessive amount of memory,
what is the metric for the graph you show? Can you check mem_tracker_server_Tablets_overhead_PerTablet
(for DocDB layer) and allocated_mem_bytes
from pg_stat_activity
(for the SQL layer)
(e.g. 300mb with around 310 million rows)
What is the size of one row? Can you share a small sample of the measurement files, to test it.
Hi @FranckPachot, thank you very much for your answer and your clarification regarding memory usage.
The graph shows the memory consumption metrics on OpenShift for the yb-tservers stateful set, showing yb-tserver-0 pod consuming 25 gigabyte of memory when ingesting a 300 megabyte file (310 million rows).
allocated_mem_bytes
is just rising and rising:
I tried to get the mem_tracker_server_Tablets_overhead_PerTablet
from the Web UI (Metrics), unfortunately I cannot see yb-tserver-0 reporting it (I can see the other 2 tservers reporting it though).
The size of one row is quite small, it contains just a uuid, a signal name (mostly < 256 chars), a timestamp and a 64bit floating point number.
Example:
8884dc5b-1f08-47e1-8d2c-8e8b8f716f71 m_torque 2023-07-01 12:01:56.874 +0200 917.0
Thats 65 bytes.
Oddly enough, I don't get any error in the YugabyteDB TServer logs, the connection to the pod just terminates:
psycopg2.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
It's very much like the issue described here: https://github.com/yugabyte/yugabyte-db/issues/5453
Thank you very much for your help.
I tried some variations to get something similar to what you get (high pg_stat_activity.allocated_mem_bytes) I got this when explicitly disabling intermediate commits with
set yb_default_copy_from_rows_per_transaction=0`:
and it finally failed with ERROR: temporary file size exceeds temp_file_limit
This doesn't happen when the foreign key is removed.
I'll open an issue. For the moment, the workarounds are intermediate commit (but from what you say you want the whole to be transactional) or without the foreign key (there's a DISABLE_FK_CHECK in COPY but I guess you have to use copy_expert
rather than copy_from
)
@FranckPachot thanks for the investigation.
I have rewritten my script to use psycopg3 which is much simpler to use, so I tried using COPY signals FROM STDIN WITH (DISABLE_FK_CHECK)
, but this fails with the same behaviour.
I then removed the foreign key entirely and created the signals table without it. No luck. Still consuming way too much memory.
Btw: select * from pg_settings where name = 'yb_default_copy_from_rows_per_transaction';
shows 20000
, which is the default I guess.
ok, so it seems to be a different problem. Will check with @droberts-yb if we can reproduce it
Side note: I have been using a connection set to autocommit
this morning and can confirm that memory consumption stays flat in this case.
So I guess it is definitely an issue with transaction size?
OK, my first thought was that yb_default_copy_from_rows_per_transaction
was disabled with a warning when not in autocommit but I tested it quickly and didn't confirm it. My test was probably wrong. We need to reproduce it to understand what is allocated (foreign key was a possibility but maybe something else). Usually bulk loads are done with intermediate commits.
Yeah, the challenge here is that our bulk belongs to an entity and that entity can hold a lot of data (100 gigabytes in the worst case), so the desired behaviour is "all of entity" or "nothing".
@FranckPachot @droberts-yb in the meantime I tried a different approach in that I am using COPY to ingest to a temporary table and then doing an INSERT INTO SELECT from this temp table to the real table, which gives me transactional behavior when the connection fails (and also helps with parallel ingests).
First, I had to increase the maximum temp file limit (it was set to 1GB, I just set it to 100GB, just to be safe).
Then, I tried to ingest 2 files (same as above) in parallel, they failed with a Disk Full error (I have 50GB disks now).
I created the temp table without foreign keys using CREATE TEMP TABLE signals_temp (LIKE signals INCLUDING DEFAULTS)
, so I am not sure if that is the same issue you mentioned above, @FranckPachot.
Thanks for testing. I'm not surprised that it is the same from a temporary table but it helps to understand what happens. What is the size of file (you said 300mb, right?), and the size of the temporary table? (select pg_table_size('signals_temp');
)
Yes, it's the same file (300mb, 313 million rows). Size of the signals_temp table is:
select pg_size_pretty(pg_table_size('signals_temp'));
31 GB
313 million rows of 65 bytes are 19GB, so 31GB seems large, but it's okay. What format is the file to hold that in 300MB?
So, yes, that seems too much to be loaded in one transaction. Maybe you can load it non-transactionally and have a view to make it visible at the end. like this: https://dev.to/yugabyte/yugabytedb-transactional-load-with-non-transactional-copy-10ac
The file is .mf4 (https://www.asam.net/standards/detail/mdf/). A compact binary format for measurement data.
I believe I am already loading non-transactionally, I am setting autocommit=True
on the psycopg connection. So the COPY to the temp table already is non-transactional, as in each row is committed immediately. So yes, I realize that the temp table should be 19gb, but I do not understand where the overhead of 12gb comes from.
For reference, a 3.5mb file with 2 million rows has a temp table size of 150mb, which is okay given that the size should be around 130mb (65 bytes * 2 million = 130mb).
But this overhead grows exponentially, as stated, the overhead for 300mb, 313 million rows is 12gb.
Typically, our measurement files are 2.5gb in size, with over 1 billion rows alone, and can in the worst case be 100gb in size.
The overhead adds up pretty quickly, then.
We could reduce the row size a bit, though, by not storing a uuid in the signals table but a bigserial or even serial.
Jira Link: DB-10267
Description
Hello everyone.
I am facing memory issues when trying to ingest "large" files via a transactional copy.
My setup is as follows:
YugabyteDB OpenSource 2.21 deployed on Red Hat OpenShift 4.12 via the official Helm Chart (https://charts.yugabyte.com/).
Resource and storage configuration:
Here is the DDL of my DB schema:
My use-case is as follows:
I am extracting data out of a measurement file, which contains time series data for 0..n sensors (signals). Each file is identified by a measurement id and is registered in the measurements table. The corresponding signal data is ingested into the signals table, which is hashed on the combination of signal name and measurement id. The scope of the transaction is the measurement file as a whole, meaning I either want the entire file to be committed or nothing at all.
This will lead to very large transactions as you can imagine.
I am batching the ingest via COPY FROM for each sensor using a python script with psycopg2.copy_from() and StringIO, based on the example by @FranckPachot (https://dev.to/yugabyte/bulk-load-into-postgresql-yugabytedb-psycopg2-fep). The difference to his approach is that I cannot split the ingest up into multiple transactions.
I am running this through an
oc port-forward yb-tserver-0 5433
.While this works fine for smaller files (e.g. 3.5mb with around 2 million rows), the ingest of larger files (e.g. 300mb with around 310 million rows) consumes excessive amount of memory, so that I am effectively hitting the resource limits of my OpenShift resource configuration, which terminates the connection.
It is clear that, due to the scope of the transaction, at some point Yugabyte needs to have a full copy of the data in memory, but it seems dubious to me that 300mb of data lead to 24gb+ of memory consumption on the tserver.
Any suggestions?
Thank you very much for your help!
Jo
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information