Closed crabhi closed 2 years ago
Just for illustration - this is how the difference looks like during the import .
In the logs, I found repeated occurences of:
W1217 14:56:11.327088 1605522 long_operation_tracker.cc:113] UpdateReplica running for 1.000s in thread 1621627:
@ 0x7f912d40711f (unknown)
@ 0x7f912dd84b39 __lll_lock_wait
@ 0x7f912dd7f6e2 __GI___pthread_mutex_lock
@ 0x7f9136f03e18 rocksdb::port::Mutex::Lock()
@ 0x7f9136f78d4b rocksdb::InstrumentedMutex::Lock()
@ 0x7f9136e60cef rocksdb::DBImpl::WriteImpl()
@ 0x7f9136e62963 rocksdb::DBImpl::Write()
@ 0x7f913c75c470 yb::tablet::Tablet::WriteToRocksDB()
@ 0x7f913c761eb4 yb::tablet::Tablet::ApplyKeyValueRowOperations()
@ 0x7f913c7623fe yb::tablet::Tablet::ApplyOperation()
@ 0x7f913c7627a3 yb::tablet::Tablet::ApplyRowOperations()
@ 0x7f913c7fd9b8 yb::tablet::WriteOperation::DoReplicated()
@ 0x7f913c7ee07b yb::tablet::Operation::Replicated()
@ 0x7f913c7f3960 yb::tablet::OperationDriver::ApplyTask()
@ 0x7f913c7f40ae yb::tablet::OperationDriver::ReplicationFinished()
@ 0x7f913c3fc377 yb::consensus::ReplicaState::NotifyReplicationFinishedUnlocked()
(yb-tserver.WARNING
)
And
W1217 14:17:54.329200 1675414 async_rpc_tasks.cc:260] 178b4f6bda244cafbbae88fecf4a12df Flush Tablets RPC (task=0x2dfbd50, state=kRunning): TS 178b4f6bda244cafbbae88fecf4a12df: Flush Tablets RPC failed for tablet : Timed out (yb/rpc/outbound_call.cc:485): FlushTablets RPC (request call id 14962) to 127.0.1.1:9100 timed out after 30.000s
(yb-master.WARNING
)
But I'm not quite sure what to look for. :thinking: I'm OK with sharing the logs if you'd like to see them. There is nothing too sensitive in the database but I don't feel comfortable posting them publicly.
(6x AWS t3a.large, 100 GB disk each, 3x yb-master, 6x yb-tserver)
In this case, it's better to have 3 t3a.xlarge instead of 6 t3a.large. You want to scale vertically (at least) until the recommended hardware size https://docs.yugabyte.com/latest/deploy/checklist/#hardware-requirements
t3a.large
I suggest not using burstable vps for hosting a database.
The current script uses basically psql -c "\copy (SELECT ... ORDER BY id) TO STDOUT" | ssh $REMOTE ysqlsh -c "\copy $TABLE FROM STDIN"
Try to separate the data-generation from data-insertion. Meaning, generate static files like .csv, and then only benchmark the "loading/inserting" part.
I don't know where to look for possible bottleneck and how to debug further. I can't reliably go between the "slow import" and "fast import" states either. :-)
Don't use burstable cpu-s.
How big are the files? Use smaller files.
Is this a normal operation that you need to do in production? Meaning, in OLTP scenarios you usually have many connections writing few rows (which we optimize) compared to few connections doing very large transactions (which we don't optimize on and is a little heavy because of synchronous-replication + shared design (the insert will end up making a distributed transaction across all nodes with indexes too)).
How to replicate? Can you use any of our benchmarking tools: https://docs.yugabyte.com/latest/benchmark/
Use something around our recommended hardware: https://docs.yugabyte.com/latest/deploy/checklist/#hardware-requirements
Thank you for your response. I'll try with different hardware. What surprised me were the performance differences which I couldn't explain, not slow performance per se. That makes me worry we could hit unexpected slowdowns down the road.
I see quite low TPS or disk througput. CPU is not at max either. Memory is not maxed either.
You probably need to increase concurrency and lower the size of the files.
But note that you won't be able to reproduce with burstable cpus.
What surprised me were the performance differences which I couldn't explain, not slow performance per see.
You need to remove as many undefined variables as possible. The easiest way to do that is by following any of the benchmarks suites that we use (assuming any of them can be similar to your requirements).
Hello in the new year. :-)
I tried again with 3x c5a.2xlarge
. Using a single client, I observe data load throughput of around 2 - 8 MiB/s (4 000 rows/s). That's somewhat better than the previous case, but still not great. The CPU is at 30 % utilization and the network usage between the instances is less than 100 Mbit/s (c5a.2xlarge should be capable of 2.5 Gbps sustained, 10 Gbps burst).
Interestingly, if I break the data and import the chunks in parallel (I tried 4x and 8x), I see almost no improvement in the speed. However, the workload will saturate CPUs of all instances by yb-tserver
processes.
I see around 400 - 450 IOPS (with 3000 IOPS EBS disks). Average queue size is less than one.
Can you paste output of \d+ public.xyz
just to make sure on the sharding?
Also, how many tablets are for that table?
Can you paste a screenshot of http://yb-master-ip:7000/tablet-servers
?
Try to separate the data-generation from data-insertion. Meaning, generate static files like .csv, and then only benchmark the "loading/inserting" part.
Did you do this?
Interestingly, if I break the data and import the chunks in parallel (I tried 4x and 8x), I see almost no improvement in the speed. However, the workload will saturate CPUs of all instances by yb-tserver processes.
Are you sending the requests all into 1 server or spreading on the 3 servers?
yugabyte=# \d+ public.xyz
Table "public.xyz"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------+-----------------------+-----------+----------+-------------------------------------------------------+----------+--------------+-------------
id | integer | | not null | nextval('xyz_id_seq'::regclass) | plain | |
text | text | | not null | | extended | |
text_hash | character varying(40) | | not null | | extended | |
Indexes:
"xyz_pkey" PRIMARY KEY, lsm (id HASH)
Referenced by:
TABLE "some_other" CONSTRAINT "xpos_caption_fk_id_bf82c7ed_fk_sentione_" FOREIGN KEY (caption_fk_id) REFERENCES xyz(id) DEFERRABLE INITIALLY DEFERRED
TABLE "some_other" CONSTRAINT "xpos_context_fk_id_8cf9123f_fk_sentione_" FOREIGN KEY (context_fk_id) REFERENCES xyz(id) DEFERRABLE INITIALLY DEFERRED
TABLE "some_other" CONSTRAINT "xpos_context_title_fk_id_04941e12_fk_sentione_" FOREIGN KEY (context_title_fk_id) REFERENCES xyz(id) DEFERRABLE INITIALLY DEFERRED
TABLE "some_other" CONSTRAINT "xpos_description_fk_id_0f41143d_fk_sentione_" FOREIGN KEY (description_fk_id) REFERENCES xyz(id) DEFERRABLE INITIALLY DEFERRED
TABLE "some_other" CONSTRAINT "xpos_meta_title_fk_id_73b2dbd1_fk_sentione_" FOREIGN KEY (meta_title_fk_id) REFERENCES xyz(id) DEFERRABLE INITIALLY DEFERRED
TABLE "some_other" CONSTRAINT "xpos_parent_text_fk_id_26e85a17_fk_sentione_" FOREIGN KEY (parent_text_fk_id) REFERENCES xyz(id) DEFERRABLE INITIALLY DEFERRED
TABLE "some_other" CONSTRAINT "xpos_title_fk_id_97d55e0d_fk_sentione_" FOREIGN KEY (title_fk_id) REFERENCES xyz(id) DEFERRABLE INITIALLY DEFERRED
Try to separate the data-generation from data-insertion. Meaning, generate static files like .csv, and then only benchmark the "loading/inserting" part.
Did you do this?
No, but I'm fairly sure this is not the bottleneck. At the same time as I was exporting the data to Yugabyte, I tried the same export to /dev/null
and it went at 70 MB/s. I'll try it anyway in a minute.
Are you sending the requests all into 1 server or spreading on the 3 servers?
Just one server, but the CPU is saturated at all of them. Only one has those postgres: ... COPY
processes, but they consume only smaller part of CPU. I'll try to distribute.
This is the screenshot you asked for. There are more tables but only two of them have some data for now.
By the way, thank you for looking into this @ddorian. I appreciate it very much. :smiley:
Interesting. We're back to the behaviour I observed initially - the varying speed. However, the reduced transaction size seems to help somewhat.
$ pv -c < export_xyz_0000000001.txt | ssh admin@3.71.31.219 '/srv/yugabyte-2.8.0.0/bin/ysqlsh yugabyte -c "\copy xyz FROM STDIN WITH (ROWS_PER_TRANSACTION 1000)"'
1.61GiB 0:00:20 [80.8MiB/s] [=============================================================================================>] 100%
ERROR: duplicate key value violates unique constraint "xyz_pkey"
# TRUNCATE from another session
$ pv -c < export_xyz_0000000001.txt | ssh admin@3.71.31.219 '/srv/yugabyte-2.8.0.0/bin/ysqlsh yugabyte -c "\copy xyz FROM STDIN WITH (ROWS_PER_TRANSACTION 1000)"'
654MiB 0:01:13 [10.2MiB/s] [====================================> ] 39% ETA 0:01:50
Now, it occured to me that maybe Yugabyte, when it encounters an error, still consumes the rest of the input data and only then reports the error. :bulb: That would explain the wildly jumping speeds between tries. And especially the slower import after TRUNCATE - into a clean table. If that's true, then every time I observed fast import, then it would fail eventually and only because I was doing several experiments without letting them to finish, I didn't see the error. :thinking:
Now, it occured to me that maybe Yugabyte, when it encounters an error, still consumes the rest of the input data and only then reports the error.
Don't think so. You can easily verify this by importing a big file and making a duplicate of the 2nd line so it triggers a duplicate key on the unique index.
Yes, I think this happens when I start the import without truncating the table first. The duplicate occurs early. In fact, this is the way I can reproduce the issue.
ERROR: duplicate key value
but only when the whole file is sent (mine has 1.6 GB).Run the import again, observe 80 MB/s, observe ERROR: duplicate key value but only when the whole file is sent (mine has 1.6 GB).
Maybe ssh is transferring the whole file and then it starts getting imported? You can easily verify this by using a local file on the server.
I tried with the file on Yugabyte server (yb-m0
) and indeed, I can observe this behaviour. I also moved pv
out in case it was buffering something.
admin@yb-m0:~$ time < export_xyz.txt ysqlsh yugabyte -c "\copy xyz FROM STDIN WITH (ROWS_PER_TRANSACTION 1000)"
ERROR: duplicate key value violates unique constraint "xyz_pkey"
real 0m2.883s
user 0m1.102s
sys 0m1.232s
admin@yb-m0:~$ cat export_xyz.txt export_xyz.txt > export-double.txt
admin@yb-m0:~$ time < export-double.txt ysqlsh yugabyte -c "\copy xyz FROM STDIN WITH (ROWS_PER_TRANSACTION 1000)"
ERROR: duplicate key value violates unique constraint "xyz_pkey"
real 0m6.621s
user 0m2.305s
sys 0m2.670s
I can see the file of double the size takes about double the time to report the error. The files are sorted by id
and the smallest id
already present in the database is 1
.
I see what is doing it. It's ysqlsh
(precisely its \copy
command)! PostgreSQL's COPY
reports the error immediately.
admin@yb-m0:~$ time ysqlsh yugabyte -c "COPY xyz FROM '/home/admin/export_xyz.txt' WITH (ROWS_PER_TRANSACTION 1000)"
ERROR: duplicate key value violates unique constraint "xyz_pkey"
real 0m0.183s
user 0m0.000s
sys 0m0.007s
Testing psql
, it behaves the same, so it's a behaviour you inherited. I think this issue can be closed now, thank you.
I'm evaluating Yugabyte and I have some issues with import of the data from Postgres. I get very different import speed every time. I'm not bothered by the speed itself at the moment but rather that I don't understand why it changes so wildly.
Steps to reproduce
pg_dump --data-only | ssh $REMOTE ysqlsh
. The current script uses basicallypsql -c "\copy (SELECT ... ORDER BY id) TO STDOUT" | ssh $REMOTE ysqlsh -c "\copy $TABLE FROM STDIN"
What I checked
/dev/null
, so it's not the bottleneck. I checked this also at the very time the slow transfer was in progress.yb-admin compact_table ysql.our_db our_table
, thenyb-admin flush_table ysql.our_db our_table
on a single master server.yb-admin compact_table
fails withTimed out waiting for FlushTables
on a full table but finishes very quickly on a truncated table.yb-admin flush_table
finishes quickly in both cases. No difference in throughput.What I observed
TRUNCATE
on the table and starting the import. Dropping the table and recreating doesn't resolve it.parallel-ssh ... --inline iostat -y 5 1 nvme1n1 | sed 's/admin@ec2.*/admin@REDACTED/'
). I see quite low TPS or disk througput. CPU is not at max either. Memory is not maxed either.I'm stuck
I don't know where to look for possible bottleneck and how to debug further. I can't reliably go between the "slow import" and "fast import" states either. :-)