rails-sqlserver / activerecord-sqlserver-adapter

SQL Server Adapter For Rails
MIT License
972 stars 558 forks source link

Ordering a query by an aliased expression generates invalid SQL #198

Closed dball closed 12 years ago

dball commented 12 years ago

This simple, albeit stupid test case exposes the problem:

diff --git a/test/cases/offset_and_limit_test_sqlserver.rb b/test/cases/offset_and_limit_test_sqlserver.rb
index 40755eb..8ce631b 100644
--- a/test/cases/offset_and_limit_test_sqlserver.rb
+++ b/test/cases/offset_and_limit_test_sqlserver.rb
@@ -36,6 +36,16 @@ class OffsetAndLimitTestSqlserver < ActiveRecord::TestCase
       books = Book.all :select => 'books.id, books.name', :limit => 3, :offset => 5
       assert_equal Book.all[5,3].map(&:id), books.map(&:id)
     end
+
+    should 'work with aliased expressions in order' do
+      books = Book.all(:select => 'books.id, (author_id * 3.5) AS foo', :order => 'foo')
+      assert_equal Book.all.map(&:id), books.map(&:id)
+    end
+
+    should "work with aliased expression in order with limit and offset" do
+      books = Book.all(:select => 'books.id, (author_id * 3.5) AS foo', :order => 'foo', :limit => 3, :offset => 5)
+      assert_equal Book.all[5,3].map(&:id), books.map(&:id)
+    end

     # ActiveRecord Regression 3.2.3?
     # https://github.com/rails/rails/commit/a2c2f406612a1855fbc6fe816cf3e15b4ef531d3#commitcomment-1208811

We're selecting an expression along with the table columns and we would like to be able to order by the alias. It works nicely as we expect when we do not limit or offset, but generates invalid SQL when we do:

test: When selecting with limit and offset should work with aliased expression in order with limit and offset. (OffsetAndLimitTestSqlserver):
ActiveRecord::StatementInvalid: TinyTds::Error: Invalid column name 'foo'.: EXEC sp_executesql N'SELECT TOP (3) [__rnt].* FROM ( SELECT ROW_NUMBER() OVER (ORDER BY foo ASC) AS [__rn], books.id, (author_id * 3.5) AS foo FROM [books] ) AS [__rnt] WHERE [__rnt].[__rn] > (5) ORDER BY [__rnt].[__rn] ASC'

Replacing the alias with the expression in the OVER (ORDER BY ... ASC) clause fixes the query and apparently yields the correct result.

This example is contrived, but the problem crops up in the real world when using geokit-rails3 with the adapter.

I've poked around in the Arel visitor within the adapter but haven't found the magic formula yet, if there even if one. I think you'd have to have the select clause make available a hash of aliases and try to lookup the orders therein.

But, first thing first. Do you agree this is a bug? One could argue that this is just a database idiosyncrasy that ActiveRecord users have to be aware of and work around. I myself disagree, but it's worth asking. If you do agree, I'll keep digging, but would appreciate any suggestions for the correct fix.

metaskills commented 12 years ago

Bug, idiosyncrasy, it is what it is. I am leaning more toward idiosyncrasy. I say that because the adapter/visitor has to do what it has too for limit and offset support. This means that you can't write code like the above example without being aware that some DBs (perhaps only ours) will just not let that work.

That said, I am open to elegant patches or hacks :) and I have none to offer.

dball commented 12 years ago

Here's a patch:

diff --git a/lib/arel/visitors/sqlserver.rb b/lib/arel/visitors/sqlserver.rb
index 88da886..1af6d83 100644
--- a/lib/arel/visitors/sqlserver.rb
+++ b/lib/arel/visitors/sqlserver.rb
@@ -168,12 +168,31 @@ module Arel
       def visit_Arel_Nodes_SelectStatementWithOffset(o)
         o.limit ||= Arel::Nodes::Limit.new(9223372036854775807)
         orders = rowtable_orders(o)
+        select_and_from_clauses = visit_Arel_Nodes_SelectStatementWithOutOffset(o, true)
+        select_clause = select_and_from_clauses.split(/\s+FROM/).first
+        select_columns = select_clause.split(/\s*,\s*/)
+        column_aliases = {}
+        select_columns.each do |select_column|
+          values = select_column.split(/\s+AS\s+/)
+          if values.length == 2
+            column_aliases[values[1]] = values[0]
+          end
+        end
+        order_clause = orders.map { |x|
+          order_column_and_direction = visit(x)
+          if md = order_column_and_direction.match(/(.+)\s+(ASC|DESC)$/)
+            order_column = column_aliases.fetch(md[1], md[1])
+            "#{order_column} #{md[2]}"
+          else
+            order_column_and_direction
+          end
+        }.join(', ')
         [ "SELECT",
           (visit(o.limit) if o.limit && !windowed_single_distinct_select_statement?(o)),
           (rowtable_projections(o).map{ |x| visit(x) }.join(', ')),
           "FROM (",
-            "SELECT ROW_NUMBER() OVER (ORDER BY #{orders.map{ |x| visit(x) }.join(', ')}) AS [__rn],",
-            visit_Arel_Nodes_SelectStatementWithOutOffset(o,true),
+            "SELECT ROW_NUMBER() OVER (ORDER BY #{order_clause}) AS [__rn],",
+            select_and_from_clauses,
           ") AS [__rnt]",
           (visit(o.offset) if o.offset),
           "ORDER BY [__rnt].[__rn] ASC"

It does a good bit of string work, but the cost seems to be basically a rounding error against the round-trip cost of a query. Could probably reduce the cost a smidge by pre-splitting up the output from the internal call to visit_Arel_Nodes_SelectStatementWithOutOffset if we really cared; doubt it's worth the extra complexity to do that though.

If you're hip to this basic solution (which builds a hash of column aliases by parsing the select clause and substitutes if necessary in the rowset order clause), I'll clean it up, extract a helper method, and submit a pull request.

metaskills commented 12 years ago

Looks, good if you can clean it up a bit. Is there a way that you can use the Arel nodes more to your advantage vs string splitting and matching? For instance, using the o.cores.first.projections. FYI, I've taken a stance that we only support single cores in this arel visitor. So that is why I only concern ourselves with o.cores.first. Likewise on cleaning up and using Arel more, is it necessary to do do the order string matching. The adapter has taken a stance that everything that comes down the pike will be true ordering nodes. So they have methods to use for finding if they are ascending or descending. Perhaps that can not be done from an outside in approach where you want to support 3rd parties passing strings down. Which kind of does beg the point if the should to begin with vs using the AST, sigh.

dball commented 12 years ago

Yeah, I rewrote the order stuff to operate on arel nodes directly, much cleaner and marginally faster. All the tests pass in both ruby 1.8.7 and 1.9.3, but I wanted to verify it fixes the production case before sending a pull request. I was stymied though, there's no rake task to build a gem and the :path Gemfile option doesn't work against the root of this repo. How do you build the gem?

I saw that you're ignoring cores other than the first. What are these cores, and in what circumstance might there be more than one?

dball commented 12 years ago

Tch. Turns out this approach won't be able to work except in the simplest of cases. Given a SELECT column expression that contains commas, which would describe any useful function call, it fails to inline the correct expression. Split /,/ isn't sufficient, it requires a parser that can at least keep track of parentheses depth, and probably string boundaries to work correctly on common non-trivial input. Arel doesn't seem to be of any help here, it considers the select columns, or projections in its terminology, to be a simple string literal. Teasingly, Arel offers some NamedFunction support, but it looks like the calling code would have to be an active participant in their use, which means hacking the geokit-rails3 library. If I have to hack into it, I might as well make things easy on myself and inline the order expression there and call it a day.

I'll go ahead and paste the final patch herein just for historical reference and close this out; if a workaround occurs to me, or I decide parsing the SQL is worth it, I'll reopen it.

commit b48936e6604f2d189306601567ce39669a9f0be2
Author: Donald Ball <donald.ball@gmail.com>
Date:   Tue May 8 22:00:34 2012 -0400

    Allows order clauses to contain aliases in queries with offsets

diff --git a/lib/arel/visitors/sqlserver.rb b/lib/arel/visitors/sqlserver.rb
index 88da886..ac0bfb7 100644
--- a/lib/arel/visitors/sqlserver.rb
+++ b/lib/arel/visitors/sqlserver.rb
@@ -167,13 +167,15 @@ module Arel

       def visit_Arel_Nodes_SelectStatementWithOffset(o)
         o.limit ||= Arel::Nodes::Limit.new(9223372036854775807)
-        orders = rowtable_orders(o)
+        select_and_from_clauses = visit_Arel_Nodes_SelectStatementWithOutOffset(o, true)
+        column_aliases = parse_column_aliases(select_and_from_clauses)
+        orders = visit_order_nodes_and_replace_aliases(rowtable_orders(o), column_aliases)
         [ "SELECT",
           (visit(o.limit) if o.limit && !windowed_single_distinct_select_statement?(o)),
           (rowtable_projections(o).map{ |x| visit(x) }.join(', ')),
           "FROM (",
-            "SELECT ROW_NUMBER() OVER (ORDER BY #{orders.map{ |x| visit(x) }.join(', ')}) AS [__rn],",
-            visit_Arel_Nodes_SelectStatementWithOutOffset(o,true),
+            "SELECT ROW_NUMBER() OVER (ORDER BY #{orders.join(', ')}) AS [__rn],",
+            select_and_from_clauses,
           ") AS [__rnt]",
           (visit(o.offset) if o.offset),
           "ORDER BY [__rnt].[__rn] ASC"
@@ -380,6 +382,32 @@ module Arel
         end.join(', '))
       end

+      def parse_column_aliases(select_and_from_clauses)
+        select_clause = select_and_from_clauses.split(/\s+FROM/).first
+        select_columns = select_clause.split(/\s*,\s*/)
+        column_aliases = {}
+        select_columns.each do |select_column|
+          values = select_column.split(/\s+AS\s+/)
+          if values.length == 2
+            column_aliases[values[1]] = values[0]
+          end
+        end
+        column_aliases
+      end
+
+      def visit_order_nodes_and_replace_aliases(orders, aliases)
+        orders.map { |node|
+          if node.is_a?(Arel::Nodes::Ascending) ||
+            node.is_a?(Arel::Nodes::Descending)
+            expression = aliases[node.expr]
+            if expression
+              node = node.class.new(Arel.sql(expression))
+            end
+          end
+          visit(node)
+        }
+      end
+
     end
   end
metaskills commented 12 years ago

there's no rake task to build a gem

I use rubygems to build and push the gem in all my projects. Like so.

$ gem build gemname.gemspec
$ gem push gemname-x.x.x.gem

No custom tools needed.

metaskills commented 12 years ago

Donald, I was about to ask if a patch to geokit-rails3 would be the better solution. Seems to me if your going to be ORM agnostic, you should be in the habit of using Arel vs raw strings as much as possible. That gives the adapter's visitor a chance of doing the right thing. This of course is a big work in progress for many project including ActiveRecord itself. When I first started work on the visitor, Rails was only doing the simplest of things to leverage Arel, since 3.2 it has gotten way better and I image i will be even more so in Rails 4.

Case in point. When ActiveRecord was passing concatenated strings down for orders, it meant we had to build real ordering nodes out of them and call uniq on that collection to get things to work with the adapter. If ActiveRecord was doing this all along, it would mean our AST visitor would just be able to do the work. The case of geokit-rails3 sounds like such an example. There have been a few times I have gone back to ActiveRecord with a patch and it ended up being the right direction. In some cases, other adapters like the Oracle enchanted benefited as well.

dball commented 12 years ago

I still think this is an adapter bug that should ideally be fixed therein. You can order on aliased columns properly when the query does not have an offset, but the rewritten query the adapter generates to work around the lack of native limit and offset within sql server is broken. In point of fact, it may not be worth the effort to patch the adapter, but that's where the misbehavior resides; geokit-rails3 doesn't seem to be doing anything wrong in its use of Arel, with the possibly exception of expressing the distance calculation as an SqlLiteral instead of a composition of functions.

metaskills commented 12 years ago

I think it is a bug in both. IE, just like patching ActiveRecord to use Arel vs strings (which was done) and use using those to build better SQL for us, which was also done. So I am not trying to punt this off, just share the load.

dball commented 12 years ago

What do you think is the bug in geokit-rails3?

metaskills commented 12 years ago

The same evolutionary bug and design enhancements that ActiveRecord is going thru. IE, using Arel vs strings. In this case, a string that could be valid SQL like this books.id, (author_id * 3.5) AS foo until you consider that other visitors like SQL Server can not do that using limit/offset. So like ActiveRecord they should be using Arel to alias the table via it's interface and dropping those down to the select. This way our visitor can use these real objects vs string parsing.

dball commented 12 years ago

I was looking through the Arel AST nodes to see if the select columns, er, projections had an internal representation other than SqlLiteral, but I confess, I could not find one. How would you do what you're suggesting for a simple query like this?

Book.all :select => 'books.id, books.name'