rails-sqlserver / activerecord-sqlserver-adapter

SQL Server Adapter For Rails
MIT License
968 stars 558 forks source link

ROW_NUMBER based limit for 2005 and later #16

Closed rickmark closed 13 years ago

rickmark commented 14 years ago

Altered add_limit_offset! to use new ROW_NUMBER() method of limiting in SQL 2005 and higher

penwellr@31b7a5cb39c5eb7a73695ac3079f63039b8efb1d

metaskills commented 14 years ago

I will be working on this at some point in time.

rickmark commented 14 years ago

I saw a comment on the commit, how do you think we should handle the case when no order is applied as ROW_NUMBER required some order (which my patch uses a little hack to find the primary key of a table)

metaskills commented 14 years ago

Good points. I'll have to thread that comment in when I get to this ROW_NUMBER() function stuff. BTW, I plan on doing that after I do all the rails 3 compatibility. So it may be awhile.

jondkinney commented 14 years ago

I needed to fix this last night to get some custom paginate_by_sql working. I have a solution that is somewhat messy... has no test coverage, but seems to work for me. If someone wants to run with this I'd be happy to help make it better.

https://gist.github.com/5461ba88c387ed275cf4

Since the order is applied in an outer select statement you can't order by any aliased column names (they don't exist yet in the context of the order by). I tried to figure out a work around for this (putting order by rand() in the row_number over order by section for example and then a final order by at the very end could be by an aliased column) but it was buggy and didn't return the results properly all the time so I stopped going down that path. Perhaps there is a solution but for now this seemed to work for most queries.

I verified it works for a variety of selects like this:

"select * from users u where blah"
"select * from users where blan"
"select * from users"
"select * from users u"

The regex takes into account all of those and will provide a default order by of "table_name.id" (or "table_name_alias.id") if you don't provide an order condition. Otherwise it will use the given order condition.

The only problem is that i couldn't get it to work well for nested selects... things like:

select * 
  from 
    (
      select some_aliased_thing,* 
      from users
    ) as results
  where
    some_aliased_thing = condition

Not sure how to account for that scenario.

It's a start anyway. -Jon

rickmark commented 14 years ago

This was a rails initializer I used for a long time with SQL Server 05 and will_paginate

http://gist.github.com/335683

jondkinney commented 14 years ago

Used? Are you just not using SQL05 anymore?

Did pagination work out of the box then with that? Or did you have to override WillPaginate as well? There were things that didn't seem to be right with the paginate_by_sql method, hence my modifications. The main one was specifying an order param in the paginate_by_sql method. Thoughts? Thanks!

jondkinney commented 14 years ago

penwellr: I tried your initializer and it didn't work for me. It wasn't limiting any of the result sets... not sure what was up there. Ken: what is the status of getting something like this into the core?

Thanks, -Jon

rickmark commented 14 years ago

That's strange, hooking this in as an initializer worked for us... What SQL were you getting from something like Table.find(:offset => 5, :limit => 5). With this in place we used a stock version of will paginate. We felt it was better to monkey patch the DB adapter rather then the actual will_paginate gem.

We moved to MySQL about 2 weeks ago, mainly because it was too hard to use MSSQL.

jondkinney commented 14 years ago

Hi Richard & Ken,

So I have spent the last 3-4 days working on this pagination issue with SQL05 and will_paginate. What I have found is that unless you're paging by specific SQL the current way the adapter writes the pagination SQL is old school (using order by and tmp tables) but it works. I tried using your initializer to monkey patch ActiveRecord but it was very brittle. For instance I was unable to order by field in a join table (actually any field other than the ID of the primary table) if there were conditions that were also specified from that join table. I'll give you an example

@users = User.paginate :page => params[:page], :per_page => 30, 
  :order => "users.updated_at desc", 
  :conditions => "users.enabled = 1 and locations.id in (1,2,3,4,5,6,7,8,9)", 
  :include => [:roles, :locations, :participant]

The SQL that is being created by AR for the eager loading of the user_ids in the associations is as follows:

SELECT DISTINCT [users].id
FROM [users]
  LEFT OUTER JOIN [locations_users]
    ON ([users].[id] = [locations_users].[user_id])
  LEFT OUTER JOIN [locations]
    ON ([locations].[id] = [locations_users].[location_id])
WHERE (users.enabled = 1 AND locations.id in (1,2,3,4,5,6,7,8,9))

But then this method gets a hold of it and does it's work

def add_order_by_for_association_limiting!(sql, options)
  # Disertation http://gist.github.com/24073
  # Information http://weblogs.sqlteam.com/jeffs/archive/2007/12/13/select-distinct-order-by-error.aspx
  return sql if options[:order].blank?
  columns = sql.match(/SELECT\s+DISTINCT(.*?)FROM/)[1].strip
  sql.sub!(/SELECT\s+DISTINCT/,'SELECT')
  sql << "GROUP BY #{columns} ORDER BY #{order_to_min_set(options[:order])}"
end

And it looks like this:

User Load IDs For Limited Eager Loading (38.5ms):  
SELECT TOP 30 [users].id
FROM [users] 
  LEFT OUTER JOIN [locations_users] 
    ON ([users].[id] = locations_users].[user_id]) 
  LEFT OUTER JOIN [locations] 
    ON ([locations].[id] = [locations_users].[location_id]) 
WHERE (users.enabled = 1 AND locations.id in (1,2,3,4,5,6,7,8,9)) 
GROUP BY [users].id 
ORDER BY MIN(users.updated_at) DESC

It works for the first page of users... but then when you go to page 2. You get this:

User Load IDs For Limited Eager Loading (0.0ms)   ODBC::Error: 37000 (8120)
[unixODBC][FreeTDS][SQL Server]Column 'users.updated_at' is invalid in the
select list because it is not contained in either an aggregate function or
the GROUP BY clause.: 
SELECT TOP 30 * 
FROM ( SELECT ROW_NUMBER() OVER( ORDER BY users.updated_at desc ) AS row_num, 
         [users].id 
       FROM [users] 
        LEFT OUTER JOIN [locations_users] 
          ON ([users].[id] = [locations_users].[user_id]) 
        LEFT OUTER JOIN [locations] 
          ON ([locations].[id] = [locations_users].[location_id]) 
        WHERE 
          (users.enabled = 1 AND 
          locations.id in (1,2,3,4,5,6,7,8,9)) 
        GROUP BY [users].id 
     ) AS t 
WHERE row_num > 30

This is how the current adapter does it by default:

User Load IDs For Limited Eager Loading (79.5ms)    
SET NOCOUNT ON DECLARE @row_number TABLE (row int identity(1,1), id int) 
INSERT INTO @row_number (id)     
SELECT [users].id 
FROM [users] 
  LEFT OUTER JOIN [locations_users] 
    ON ([users].[id] = [locations_users].[user_id]) 
  LEFT OUTER JOIN [locations] 
    ON ([locations].[id] = [locations_users].[location_id]) 
  WHERE (
    users.enabled = 1 AND 
    locations.id in (1,2,3,4,5,6,7,8,9)) 
  GROUP BY [users].id 
  ORDER BY MIN(users.updated_at) DESC     
SET NOCOUNT OFF 
SELECT id 
FROM ( 
  SELECT TOP 30 * 
  FROM ( SELECT TOP 30 * FROM @row_number ORDER BY row ) AS tmp1 
  ORDER BY row DESC 
) AS tmp2 
ORDER BY row 

And then page 2 of the users (note the inefficient "top 60")

User Load IDs For Limited Eager Loading (65.9ms)    
SET NOCOUNT ON DECLARE @row_number TABLE (row int identity(1,1), id int) 
INSERT INTO @row_number (id) 
SELECT [users].id 
FROM [users] 
  LEFT OUTER JOIN [locations_users] 
    ON ([users].[id] = [locations_users].[user_id]) 
  LEFT OUTER JOIN [locations] 
    ON ([locations].[id] = [locations_users].[location_id]) 
WHERE (
  users.enabled = 1 AND 
  locations.id in (1,2,3,4,5,6,7,8,9)) 
GROUP BY [users].id 
ORDER BY MIN(users.updated_at) 
DESC SET NOCOUNT OFF 
SELECT id 
FROM ( 
  SELECT TOP 30 * 
  FROM ( SELECT TOP 60 * FROM @row_number ORDER BY row ) AS tmp1 
  ORDER BY row DESC 
) AS tmp2 
ORDER BY row 

As you can see I'm trying to get away from this style of paging because each page further from 1 get more and more inefficient as we increase the record-set returned by 30 each for each page.

I took a crack at modifying ActiveRecord (with a monkey patch) so that I could inject the order by column in the select statement but it didn't seem to work for all the different kinds of queries that can end up passing through that method. Perhaps this work/idea will be a good jump start though so here is a gist of my initializer: http://gist.github.com/371535

But since monkey patching AR was somewhat unsuccessful for now my current solution is to use the following initializer that ONLY patches the add_limit_offset! method when called from within a paginate_by_sql call (I renamed the method to j2fly_add_limit_offset!). This will give you the ROW_NUMBER benefit when passing in custom SQL but still allow the standard Paginate method to function with the old school tmp table flipping when just doing regular pagination. http://gist.github.com/371614

Any help to get a universal ROW_NUMBER() version working would be much appreciated!

Thanks, -Jon

rickmark commented 14 years ago

Thank you for the detailed information, I'm sure we will be able to work out a more elegant solution in coming weeks as Rails3 is available as it relies on ARel (Relational Algebra) rather then string manipulation. I admit I was doing very little ordering with my versions of the initializer and it was best-effort to meet the needs of my last project.

jondkinney commented 14 years ago

Do you know if there will be any more effort with this adapter regarding it's compatibility with rails 2.3.x? I guess what I'm asking is will people need to upgrade their apps to Rails 3 to simply get better paging?

metaskills commented 14 years ago

@penwellr (Richard)

That initializer here (http://gist.github.com/335683) is pretty cool, but the #find_table_primary_key_columns is not needed. The adapter is already doing this and stores per ActiveRecord's #columns and #columns_hash. So giving an instance of a AR object you could do something like this.

instance.class.columns.select(&:primary).map(&:name) # => ["id"]

The add_limit_offset! is the exact cleanlyness I was hoping to see ROW_NUMBER yield. Good stuff. would love to see a patch for something like this.

@j2fly (Jon)

I totally so understand the craziness of the adapter's workings now. And yes, it get's bad performance the higher the offset of the page. It sucks and right now I can not put the brain effort into it. I've finally found some time today to get to old tickets and patches for a 2.3.6 release. Then as time permits, I'm gonna have to pick up two other big chuncks for the adapter. The first is taking a look at the IronRuby release and seeing if I can get the ADONET conneciton mode in. That should finish up a good solid and working 2.x.

After taht the craziness for starting to look at AREL and Rails3 implementation with @ebryn (Erik Bryn). I suspect I'll be forced to make the Rails3 adapter only 2005/2008 friendly and hence do the ROW_NUMBER then. Depending on how things look, the adapter 3.x may only work on 3.x rails, depending on how we can do core extensions, how much monkey patching is needed, etc. The 2.x version of the adapter which technically works for 2000, 2005, 2008 will more than likely be supported and have patches for quite some time too.

metaskills commented 13 years ago

Closing this issue, ROW_NUMBER is used in the latest version of the adapter for rails 3 which will be released soon. It currently passes all the ActiveRecord test, however, please do test it and let me know if there are any improvements that can be made in regards to the implementation.