vrana / notorm

NotORM - PHP library for simple reading data from the database
http://www.notorm.com/
861 stars 158 forks source link

notorm vs. oracle shared pool #65

Open pvanek opened 11 years ago

pvanek commented 11 years ago

hi Jakub,

at first - I'm not PHP user so I don't understand PDO internals etc. But now I'm asked to evaluate a design of an application which would potentially use NotORM with Oracle 11.2.0.3. The Oracle server is used for many applications while this PHP one would serve for read only quick overview display for ~50 users. No big deal.

While I studied NoORM source code briefly, playing with your example, I discovered one critical thing (from Oracle DB point of view). NotORM statemens are generated as "raw" statements - eg. in Result.php:

$condition .= " IN (" . implode(", ", $in) . ")";

resulting in statements like eg.:

SELECT ... FROM a_table WHERE (foo IN ('1', '2', '3'));
SELECT ... FROM a_table WHERE (foo IN ('6', '7', '8'));
SELECT ... FROM a_table WHERE (foo IN ('12', '222', '666'));
etc.

Which is very bad statement for Oracle DB. It's because of Oracle's architecture. It uses so called "Shared Pool" where are statements kept as in cache to avoid "Hard Parsing" of statements which is extremely costy.

So as this approach can work in small environment it would be killing in high volume OLTP processing.

Please check some classic articles like: http://www.akadia.com/services/ora_bind_variables.html http://psoug.org/reference/bindvars.html ...

I know it's NotORM approach to be used like this. So just add this note (or similar) to documentaion, please, to avoid PHP developers screwing my systems ;)

thanks

vrana commented 11 years ago

Hi,

Can you be more specific about how expensive the hard parsing is? Compared to retrieving the data, it's usually fairly simple and fast operation running in milliseconds or less. Did you run some benchmarks that will prove that using IN ('1', '2', '3') is significantly slower than something else?

The thing is that database systems don't support variable length parameters so it's not possible to write something like IN ? and pass an array of desired values. And using IN (?, ?, ...) would solve only half of the problem.

Jakub

hi Jakub,

at first - I'm not PHP user so I don't understand PDO internals etc. But now I'm asked to evaluate a design of an application which would potentially use NotORM with Oracle 11.2.0.3. The Oracle server is used for many applications while this PHP one would serve for read only quick overview display for ~50 users. No big deal.

While I studied NoORM source code briefly, playing with your example, I discovered one critical thing (from Oracle DB point of view). NotORM statemens are generated as "raw" statements - eg. in Result.php:

$condition .= " IN (" . implode(", ", $in) . ")";

resulting in statements like eg.:

SELECT ... FROM a_table WHERE (foo IN ('1', '2', '3')); SELECT ... FROM a_table WHERE (foo IN ('6', '7', '8')); SELECT ... FROM a_table WHERE (foo IN ('12', '222', '666')); etc.

Which is /very/ bad statement for Oracle DB. It's because of Oracle's architecture. It uses so called "Shared Pool" where are statements kept as in cache to avoid "Hard Parsing" of statements which is extremely costy.

So as this approach can work in small environment it would be killing in high volume OLTP processing.

Please check some classic articles like: http://www.akadia.com/services/ora_bind_variables.html http://psoug.org/reference/bindvars.html ...

I know it's NotORM approach to be used like this. So just add this note (or similar) to documentaion, please, to avoid PHP developers screwing my systems ;)

thanks

— Reply to this email directly or view it on GitHub https://github.com/vrana/notorm/issues/65.

pvanek commented 11 years ago

I'll show you the problem in some "real" situation to show you something like a benchmark. Assuming we are talking about OLTP systems. Also I hope it will be clearer at the end of this message...

-- start with clear system
alter system flush shared_pool;

-- "notorm style": +000000000 00:00:05.912614000
declare
  st timestamp := current_timestamp;
begin
  for i in 1..100 loop
    execute immediate 'select object_name from all_objects where object_id in ( '|| i || ')';
  end loop;
  dbms_output.put_line(current_timestamp - st);
end;
/

-- variable binding style: +000000000 00:00:00.088596000
declare
  st timestamp := current_timestamp;
begin
  for i in 1..100 loop
    execute immediate 'select object_name from all_objects where object_id in( :i )' using i;
  end loop;
  dbms_output.put_line(current_timestamp - st);
end;
/

for 1..1000

for 100 unique IDs "notorm style": +000000000 00:00:05.912614000 variable binding style: +000000000 00:00:00.088596000

for 1000 +000000000 00:01:01.631089000 vs +000000000 00:00:00.132568000

the problem is that parsed statements are stored in shared pool "forever" (the cleaning conditions are out of scope of this issue). So for example in systems where there are more data than few rows ther can be "infinite" count of statements in shared pool which is impossible due the memory size.

Also note that so called hard parsing cannot be serialized in oracle, it's blocking operation because of shared pool access (well, it's more complicated but let's simplify it to this status) - it means that permanent hardparsing blocks all schemas in the DB instance.

Now we have following situation in testing system. The 1st line is "safe" statement, it was executed 1000 times with only one hard parsing, using single amount of memory. Rest of lines are "notorm-like" statements with literals - every occurence of literal combination creates new prepared statement - with amount of memory and limited use.

SQL                                                            sharable     persistent  runtime versions    executions  parsecalls  buffer gets
select object_name from all_objects where object_id in( :i )    409141      157728      156120  1           1000        1           10

select object_name from all_objects where object_id in ( 808)   11646       157680      156104  0           1           1           10
select object_name from all_objects where object_id in ( 55)    405053      157680      156104  1           2           2           10
… 997 rows removed …                            
select object_name from all_objects where object_id in ( 87)    405053      157680      156104  1           2           2           10

                                                        SUM:    96573232    45096528    44645760                

and then let's assume that IDs sequence is growing. In this case there will be still one statement using variable binding or on the other side system on its knees handling growing shared pool with single-use statements.

The answer for your question: "how to resolve the IN operator with ? or ?,?,?". is quite simple. And I don't think you will like it ;)

-- takes +000000000 00:00:00.117361000
declare
  st timestamp := current_timestamp;
  o varchar2(36);
  str varchar2(100);
begin
  for i in 1..1000 loop
    str := to_char(i); -- str can be eg.: '1', or '1,2,3,4,5'... but it won't work with strings etc. out of the box.
    -- the hack
    execute immediate 'select object_name
      from all_objects where object_id in ( SELECT regexp_substr ( :str, ''[^,]+'', 1, lvl ) as val
                 FROM ( select  LEVEL lvl
                           from dual
                        CONNECT BY LEVEL <= ( ( length ( :str ) - length ( REPLACE ( :str
                                                                                  , '',''
                                                                                  , '''' ) ) ) / length ( '','' ) ) + 1 ) )' using str, str, str;
    -- end of the hack
  end loop;
  dbms_output.put_line(current_timestamp - st);
end;
/
vrana commented 11 years ago

Thanks for the benchmarks. Would it be possible to repeat them from PHP? Maybe something weird is going on in execute immediate. It seems that you know what you are talking about but I want to be sure. I just couldn't believe that Oracle takes around .06 seconds to parse a query. That's ridiculous.

NotORM couldn't use joins or subselects as it's not known what will be needed later in the code. Also simple queries perform very well in MySQL, where query cache is invalidated for all queries using a table that was modified. With two simple queries, only one query is invalidated.

Jakub

I'll show you the problem in some "real" situation to show you something like a benchmark. Assuming we are talking about OLTP systems. Also I hope it will be clearer at the end of this message...

-- start with clear system alter system flush shared_pool;

-- "notorm style": +000000000 00:00:05.912614000 declare st timestamp := current_timestamp; begin for i in 1..100 loop execute immediate 'select object_name from all_objects where object_id in ('|| i || ')'; end loop; dbms_output.put_line(current_timestamp - st); end; /

-- variable binding style: +000000000 00:00:00.088596000 declare st timestamp := current_timestamp; begin for i in 1..100 loop execute immediate 'select object_name from all_objects where object_id in( :i )' using i; end loop; dbms_output.put_line(current_timestamp - st); end; /

for 1..1000

for 100 unique IDs "notorm style": +000000000 00:00:05.912614000 variable binding style: +000000000 00:00:00.088596000

for 1000 +000000000 00:01:01.631089000 vs +000000000 00:00:00.132568000

the problem is that parsed statements are stored in shared pool "forever" (the cleaning conditions are out of scope of this issue). So for example in systems where there are more data than few rows ther can be "infinite" count of statements in shared pool which is impossible due the memory size.

Also note that so called hard parsing cannot be serialized in oracle, it's blocking operation because of shared pool access (well, it's more complicated but let's simplify it to this status) - it means that permanent hardparsing blocks all schemas in the DB instance.

Now we have following situation in testing system. The 1st line is "safe" statement, it was executed 1000 times with only one hard parsing, using single amount of memory. Rest of lines are "notorm-like" statements with literals - every occurence of literal combination creates new prepared statement - with amount of memory and limited use.

|SQL sharable persistent runtime versions executions parsecalls buffer gets select object_name from all_objects where object_id in( :i ) 409141 157728 156120 1 1000 1 10

select object_name from all_objects where object_id in ( 808) 11646 157680 156104 0 1 1 10 select object_name from all_objects where object_id in ( 55) 405053 157680 156104 1 2 2 10 … 997 rows removed … select object_name from all_objects where object_id in ( 87) 405053 157680 156104 1 2 2 10

                                                     SUM:    96573232    45096528    44645760

and then let's assume that IDs sequence is growing. In this case there will be still one statement using variable binding or on the other side system on its knees handling growing shared pool with single-use statements.

The answer for your question: "how to resolve the IN operator with ? or ?,?,?". is quite simple. And I don't think you will like it ;)

  • use joins (the best what you can do for your DB)
  • use sub-selects (not so good as the above one but it's just a personal taste)
  • use hacks (do it only when you know what to do). For example:

-- takes +000000000 00:00:00.117361000 declare st timestamp := current_timestamp; o varchar2(36); str varchar2(100); begin for i in 1..1000 loop str := to_char(i); -- str can be eg.: '1', or '1,2,3,4,5'... but it won't work with strings etc. out of the box. -- the hack execute immediate 'select object_name from all_objects where object_id in ( SELECT regexp_substr ( :str, ''[^,]+'', 1, lvl ) as val FROM ( select LEVEL lvl from dual CONNECT BY LEVEL <= ( ( length ( :str ) - length ( REPLACE ( :str , '','' , '''' ) ) ) / length ( '','' ) ) + 1 ) )' using str, str, str; -- end of the hack end loop; dbms_output.put_line(current_timestamp - st); end; /

— Reply to this email directly or view it on GitHub https://github.com/vrana/notorm/issues/65#issuecomment-20114570.

pvanek commented 11 years ago

it would be great surprise if there would be any glitch in 'execute immediate' because it's Oracle internal gate to sql engine.

I'd like to test it with PHP but I'm not able to compile PHP with Oracle's PDO. Is there any howto somewhere, please?

vrana commented 11 years ago

Sorry, I have no experience with it. I guess http://www.php.net/manual/en/ref.pdo-oci.php#ref.pdo-oci.installation isn't sufficient?