django adapter: query.DB2QueryClass.as_sql to use 'fetch first' when has high_mark (limit) and no low_mark (offset)

GoogleCodeExporter commented 9 years ago

While db2 doesn't support 'offset' in sql, and django orm needs this,
currently this is implemented with ROW_NUMBER() workaround.  

This is probably the best way to simulate offset, but disadvantage is that
the logic of code is complex, and in my opinion, can become fragile because
of this, e.g. in the case when django side changes. There is also a
question of performance. 

Whatever, to make things less 'fragile', and since db2 supports sql "limit"
part (select ... fetch first), my suggestion is to split the limit/offset
logic to two parts, i.e. to take advantage of db2 "fetch first" sql select
extension.

Idea:
    query.DB2QueryClass.as_sql

    if not (with_limits and (self.high_mark is not None or self.low_mark)):
        return super(DB2QueryClass, self).as_sql(False, with_col_aliases)

    # ----------- NEW_LOGIC: begin -------------

    elif with_limits and self.high_mark is not None and not self.low_mark:
        # only high - limit without offset
        assert self.high_mark > 0
        sql, params = super(Db2Query, self).as_sql(with_limits=False,
                                                   with_col_aliases=True)
        sql = "%s FETCH FIRST %d ROWS ONLY" % (sql, self.high_mark)

    # ----------- NEW_LOGIC: end -------------

    else:                
        self.pre_sql_setup()
        ...

Since I had similar problem when implementing db2 sqlalchemy adapter, I
have implemented only 'fetch first' part. But during the testing new
problem occured which is illustrated by the following example:

    select ...  from (select ... from ... fetch first) ...  

This example fails since 'fetch first' in subselect is not supported by
db2. I don't know if such sql construction is possible by the django
ORM (subselects with limits), but if it is and the whole idea is to be
implemented then this case should be considered too.

Original issue reported on code.google.com by trebor74hr@gmail.com on 26 Jun 2009 at 9:23

GoogleCodeExporter commented 9 years ago

1. In case of using fetch first when assert high_mark is greater than zero and 
low_mark is zero makes perfect sense. But in DB2 except using ROW_NUMBER we do 
not 
have a way when both low_mark and high_mark are non-zero. One option for us is 
to 
segregate the implementation and breaking it down to 2 cases:-
    a) When low_mark = 0 and high_mark> 0 then we can use fetch_first
    b) When both low_mark and high_mark are greater than zero then we are selecting a 
small portion of the rows to be fetched and thus need to use ROW_NUMBER where 
rownum>=low_mark and row_num<=high_mark

This case would make code little more complex but I guess most of the cases 
would 
have low_mark=0 and thus would optimize performance. I would take this up for 
the 
next release.

2) As you mentioned the case where django side changes. We certainly are bound 
by 
django implementations and limitations that DB2 has with limit/offsets. Like 
for 
example you cannot AS foo in a subquery formed by row_number. So, yes this is a 
limitation and we hold on to till either DB2 brings out new features for 
limit/offset 
or Django changes implementation(I cannot think of a usecase when that might 
happen 
though.)

Original comment by tarun.pa...@in.ibm.com on 1 Jul 2009 at 8:46

Changed state: Accepted
Added labels: Performance

GoogleCodeExporter commented 9 years ago

Original comment by rahul.pr...@in.ibm.com on 12 Nov 2009 at 12:03

wmjie / ibm-db

django adapter: query.DB2QueryClass.as_sql to use 'fetch first' when has high_mark (limit) and no low_mark (offset) #32