laurenz / oracle_fdw

PostgreSQL Foreign Data Wrapper for Oracle
http://laurenz.github.io/oracle_fdw/
Other
496 stars 156 forks source link

Using use_remote_estimate #632

Closed vishalnatekar closed 10 months ago

vishalnatekar commented 11 months ago

Hi Laurenz,

I see you have answered a query on use_remote_estimate:

https://github.com/laurenz/oracle_fdw/issues/529#event-6561291510

I had a followup question but i'm unable to raise it on that thread, so creating a new one.

I have a huge table on Oracle, If i run ANALYZE it never finishes. So we dont collect stats for that table. But we collect stats on other tables. The question that i have is - if you are saying that postgres always uses local stats and not the foreign stats when executing the query on that table. How does it work in our case? When we check the plan on the query, i can see that it is using the correct index etc. If i make some changes in the index on Oracle, i can see that plan on postgres is updated for that table without recreating or analyzing the table.

Thanks Vishal

philflorent commented 11 months ago

Hi, At some point, postgres_fdw sends one or more queries to Oracle and those queries are executed at Oracle side, they are planned by Oracle cost-based optimizer and they make full use of statistics collected by dbms_stats. What is mentioned in the other thread is that oracle_fdw, at postgres side, does not use Oracle estimations but it is able to delegate much work to Oracle in simple cases (2 tables joins, filtering etc.) . This work is optimized by Oracle as any other work. If things are OK for you, you probably have queries that are simple enough for that. Best regards, Phil

laurenz commented 11 months ago

You should use the sample_percent option, so that only a part of the table is scanned during ANALYZE. Per the documentation:

  • sample_percent (optional, defaults to "100")

    This option only influences ANALYZE processing and can be useful to ANALYZE very large tables in a reasonable time.

    The value must be between 0.000001 and 100 and defines the percentage of Oracle table blocks that will be randomly selected to calculate PostgreSQL table statistics. This is accomplished using the SAMPLE BLOCK (x) clause in Oracle.

    ANALYZE will fail with ORA-00933 for tables defined with Oracle queries and may fail with ORA-01446 for tables defined with complex Oracle views.

vishalnatekar commented 11 months ago

Hey Laurenz, Thanks a lot for you response. Is there a way i can figure out which stat a particular query is using?
I know you said simple queries with one or two joins would be using stats collected by dbms_stats. and the more complex ones should be using stats collected by postgres. But is there a way i can check that somewhere ?

Thanks

laurenz commented 11 months ago

I didn't say that. PostgreSQL will only ever use the statistics it has itself, which you can inspect with

SELECT *
FROM pg_stats
WHERE tablename = 'myforeigntab'
  AND attname = 'columnname';

Once PostgreSQL has planned the query using these data, it will execute the statement and send a statement to Oracle. Oracle uses its own statistics to plan and execute that statement.

I think that what @philflorent is saying is that for a statement like SELECT col1, col2 FROM foreign_table the PostgreSQL statistics don't matter much, because there are no two ways to execute the statement. The more complicated your statement is, the more important it is to have good statistics on the PostgreSQL side, so that it can find a good execution plan.

philflorent commented 11 months ago

Hi, Yes it was what I was meaning. If you have very simple queries with one possible plan, stats don't mater much and you don't have to worry about that. Good stats help when complexity increases a bit but, even with good stats, if you have very complex queries on big tables and that most of the work has to be done at Oracle side, you will have to create an Oracle view at some point and use this view as a foreign table. It could also be an indication that something is wrong in the whole design of your databases. Even if oracle-fdw is a good tool Oracle<=>Postgres will never be fully optimized "out of the box" for complex queries. I am a bit conservative but I would not want any OLTP workload or an end-user GUI to rely on that anyway. It would be very fragile and an EAI is probably required. It's not oracle-fdw specific even Oracle<=>Oracle or Postgres<=>Postgres workloads will never be as efficient as the same workloads within a single Oracle or Postgres database. Very complex or critical workload => one database. Best regards, Phil

laurenz commented 10 months ago

Is everything clear, so that I can close the issue?

vishalnatekar commented 10 months ago

Yes please, you can close. Thanks for the explanation