Closed AmebaBrain closed 4 months ago
See the commit b4e21374b597f8149d1239202d21a910a351f172 and the referenced issue for the reason to limit prefetch
to 1000. Note that that has nothing to do with committing a transaction.
You are right that 1000 small rows are quite different from 1000 large rows, but I can think of no other reasonably simple approach than to limit the number of rows.
I am reluctant to increase the limit unless I have good evidence that there is a potential benefit. The reason is that with the old, flawed, implementation of prefetch
, Oracle never went out of memory. It is unclear how that works, since Oracle is not open source, but as a consequence many people set their foreign tables' prefetch
option to the upper limit of 10240, thinking that "more is always better". I am worried that allowing a higher limit will cause more pain than benefit, when people upgrade oracle_fdw and are suddenly faced with OOM errors or crashes (if they didn't disable memory overcommit).
Go ahead and play with bigger values of MAXIMUM_PREFETCH
and tables of different width and see if you can get a noticeable performance improvement that way.
You already have in place default 50
value. My ask is to allow higher maximum value. And don't touch default one. Thus you will have the same default level of guard as you have now. And if users will need to increase the value - they have to do that explicitly manually. You could just add a note to the README. Please be aware that setting higher values might cause OOM on Postgres side and put a link to that #582. Users are informed and they change this setting manually by themselves. So, no "suddenly faced" scenario is possible.
As for 1000
as a max value and evidence for the potential benefit. In my case described above, I got 7 times (700%) boost when switching from 50 to 1000. And I didn't get OOM error. Doesn't it look like a good evidence?
@philflorent It depends. In general case "huge" and "enough" are relative terms. In my case I had 7 minute and 1 minute response time. Yes, 700% boost, but still - one minute it's not what you want from comfortable interaction experience. Yes, I was fetching cross-continent data over VPN, but still.
As for OOM error itself. This happens not only for oracle_fdw
. I was observing the same recently for the duckdb_fdw
. And that's is a reason, why using docker images approach as in #644 is quite popular. In this way, you could use connector as an intermediate node, without affecting main target database. And if your target is postgres itself, you could connect to it right from your container.
In such setup, you don't care too much from OOM on your intermediate container. If it's part of docker swarm or kube app, it could be automatically restarted, for example. Even without any complex setup, it's an intermediate node. Just restart or re-create it.
You'd have to try with values higher than 1000 got get evidence. Modify the source.
I guess you didn't understand my explanation. There are many foreign tables out there where prefetch
is set to 10240, which used to work before 2.6.0, but is now likely to give you an OOM error. Consequently, I force the limit down.
What you write about Docker doesn't make any sense. If you go OOM and the OOM killer gets you, PostgreSQL will undergo crash recovery and be down for a while. That is not acceptable in anything but a play system.
Could you make some experiments with higher values? What was the result?
Hi. Don't have a time to do detailed testing, but did one test locally.
I created postgres image with MAXIMUM_PREFETCH = 100000
and spinned up two containers: oracle and postgres with tweaked oracle_fdw
. Aftewards I created a dict_all_objects
table in oracle and inserted into it multiple times content of all_objects
table and got 3.6 mln rows in it. But the structure, it is rather narrow than wide table. Finally, created a corresponding foreing table in postgres for it.
Afterwards, I was changing prefetch
option for the foreign table and executing select count(*) from dict_all_objects;
. Because unfortunately currently oracle_fdw
doesn't pushing aggregates to the remote, it results into passing back 1
for every row. To exclude effect of oracle buffer cache I repeated the test with smaller values one more time.
Following tests are made in order. All the tests return the same count(*) = 3613248 value
prefetch |
time (ms) |
---|---|
50 | 3907 |
1000 | 1300 |
10^4 | 705 |
10^5 | 654 |
1000 | 1312 |
50 | 3954 |
From within oracle container same query returns within 170 ms.
My test has zero network latency, because it's 2 docker containers running locally within single docker bridge network. But if we speak about access to remote data, then prefetch
impact will be even more bigger. As in my intial test on work oracle instance, where switch from 50 to 1000 lead to 7 times decrease in response time. It's versus 3.9 sec vs 1.3 sec in my local test.
Did another CTAS
test. Surprisingly to myself, there is almost no benefit for higher values of prefetch
prefetch | time (sec) |
---|---|
50 | 33 |
1000 | 24 |
10^4 | 23 |
10^5 | 22 |
50 | 30 |
Thanks, that is interesting. What was the difference between these two tests?
No difference in infrastructure setup. Two docker containers running locally. In first case it was select count(*) from dict_all_objects;
in second create table tmp_all_objects as select * from dict_all_objects;
Can you compare the EXPLAIN (ANALYZE)
results for both cases?
The second test took so much longer, but I cannot see why, if the data were the same.
About the first (fast) test, where you saw the difference: What is the size of the rows?
The second test took so much longer, but I cannot see why, if the data were the same.
First query does SELECT COUNT(*)
which returns from remote just 1
as a value for each row. Second query does CTAS
statement which returns all columns from the source table. I.e. we transfer over network whole table content.
About the first (fast) test, where you saw the difference: What is the size of the rows?
-- oracle
SQL> select table_name, avg_row_len from user_tab_statistics where table_name = 'DICT_ALL_OBJECTS';
TABLE_NAME AVG_ROW_LEN
------------------------------ -----------
DICT_ALL_OBJECTS 148
Can you compare the EXPLAIN (ANALYZE) results for both cases?
postgres=# explain analyze verbose select count(*) from dict_all_objects;
postgres=# explain analyze verbose create table tmp_all_objects as select * from dict_all_objects;
Right, I see now.
I don't think we have to optimize for SELECT 1 FROM foreign_table
.
Perhaps you could try the experiment with slightly narrower tables, for example by selecting only a few columns from the foreign table.
If that shows a clear benefit for prefetch
values greater than 1000, I'd consider raising the limit.
My worry is that raising the limit will do more bad than good. There are so many people who think that cranking the dial to maximum will give them the best performance, and I am worried about the ensuing flood of complaints about OOM crashes. I wouldn't raise the limit for a small benefit, and I don't think that SELECT count(*) FROM foreign_table
is an interesting enough use case.
That's interesting but behavior of prefetch
is right opposite to what I was expecting. It does not bring any meaningful benefit when selecting all columns, but starts to give a benefit when you select more narrow dataset with a few columns. The biggest performance gain from higher prefetch
values you get when executing SELECT COUNT(*)
.
I've created table with 20 cols and populated all the columns with random string and numeric data of different length and width. Average row length is 924 bytes which is quite wide row. Table has 6.4 mln rows. Aftewards I started to select all columns, subset of column, just couple of columns and finally did a plain COUNT(*) from it from postgres.
To improve the performance of the analytical queries like COUNT
it would be great to add support for pushing them to the remote. This is done in jdbc_fdw. I used it with oracle and possibility to get answers for such queries quickly is superb.
Because running aggregates functions is what people usually do when starting to work with "new" table. It gives you possibility to become familiar with the data. Without aggregates support, usage of oracle_fdw
in such scenarios is very limited unfortunately.
That higher values of prefetch
are beneficial if the rows are very narrow is exactly what I would expect. Batching data in a single request gives you benefits only up to a certain size. The wider the rows, the earlier that size is reached. So for SELECT 1
you can benefit from high values of prefetch
, but for "real" queries you cannot. On the contrary: with wide rows, a high value of prefetch
will strain your memory resources, which will eventually lead to bad performance and out-of-memory problems. That's exactly why I am reluctant to increase the limit.
It would be interesting to see what the behavior for a table with — say — thee NUMBER
columns with small values is.
Yes, pushing down aggregates is desirable, but difficult. I am not ready to undertake that effort. There have been attempts to add that feature, but the patch was so huge that I could not review it.
I deny that SELECT count(*)
is an important use case. Yes, a lot of people do that with ad-hoc qeries, and misguided developers do it when they want to display a total result set count with paginated result sets. But just because some people do it does not make it an important use case in my opinion.
Ok, thanks for the explanations. I think we clarified the prefetch
behavior.
I fully understand regarding the efforts for onboarding. And that it's all volunteer work.
But I couldn't agree that using aggregates (not only COUNT
, but MIN
, MAX
, etc) is rather for ad-hoc queries. My use case is to provide a mean to compare two datasets from oracle and postgres. And first requirement is to get stat comparison: counts, min, max. Data accesses over VPN cross-continent. And for 80k dataset people just couldn't wait 1 minute to get the count. It's not usable for them. And they end up going to databases separately and executing queries in each of them.
Luckily, mentioned jdbc_fdw
has aggregates push down. I've created corresponding docker image for it. The same way as for your oracle_fdw. Drawback is that it doesn't support IMPORT FOREIGN SCHEMA
and very "raw" API, which has another bugs.
Closing the ticket
Don't get me wrong: aggregate push-down would be very valuable in my opinion.
I just don't buy that SELECT count(*) FROM foreign_table
is a relevant use case (which is why I am reluctant to raise the prefetch
limit).
If you want to compare data on Oracle and PostgreSQL, surely an identical row count can only satisfy the clueless.
I certainly wouldn't examine Oracle's data via oracle_fdw. I'd log into the Oracle database and check there.
My expectation is that oracle_fdw will outperform jdbc_fdw for most real-world queries. True, if there are big aggregates involved, maybe not.
I have raised the limit to 10240 again due to popular demand (commit 13ac9fb4e5ca5a515062b2c441d53079b0ec814d).
Currently, there is a
1:1000
range is allowed for theprefetch
option.Could you please clarify what is the rationale behind this? More specifically, why
1000
is a max value? It limits number of rows. But rows could have different number of columns. And 100 rows on a "wide" table could overweight from memory perspective 1000 or 10k rows of a "narrow" table.Is it possible to increase that limit? say up to
100 000
rows?Today, I was reading 30 cols per row, 70k rows table. With default
50
value it was taking 7 minutes. After switching to1000
about 1 minute. Speed up in 7 times is a huge win. But I'm wondering what will be the result with say10 000
batch size.For example, in
sqlldr
it was common to set commit rate during import at thousands scale: 10k, 50k, etc.