Open ManApart opened 4 years ago
Persist shouldn’t be writing via presto at all anymore. It’s a direct write to S3 as json and then select * from json_stage to orc_permanent
Isn't it presto that's doing that table copy?
It is a presto query that moves staged data to the permanent table.
And per brian this happens for every batch. So while we don't write an insert statement that inserts 1MB of rows, we write 1MB of rows, then run an insert into select from of the staging table, which practically means presto is hit almost as many times per my understanding.
Given 10 million small messages, Persist processed them at around 2,000 messages a second. This is 1/5 the speed of receive. While Persist does not bottleneck the system, this does mean it will get backed up over time if it can't keep up with other parts of the system. This also could reflect on the number of presto workers etc
AC
Tech Notes