Result#fields empty if no results are returned, and not callable before #each

jeremyevans commented 13 years ago

Most other database drivers allow calling fields before each, and return the columns that would be used even if no rows are present, which is very helpful information. Sequel relies on such behavior for proper functioning.

Also, because fields is not callable before each, I have to do something like:

        result.each(each_opts.merge!(:as=>:array)) do |r|
          @columns = cols = result.fields.map{|c| output_identifier(c)} unless cols
          h = {}
          cols.zip(r).each{|k, v| h[k] = v}
          h.delete(row_number_column) if @opts[:offset]
          yield h
        end

This means I'm checking the value of cols for every row returned. If fields was callable before each, the check could be moved outside the loop, speeding things up.

In result.c, I see that this is intended behavior. However, it causes problems, such as making it impossible to determine the columns in an empty table without parsing the INFORMATION_SCHEMA.

Could you consider making fields callable before each, and to return columns even if there are no rows returned?

Also, for best performance with Sequel in the default case, there would need to be a way to pass in a list of fields to use for the keys. Sequel uses symbolized keys, but allows you to provide a String method to call on all outputted columns to transform them (the identifier_output_method). The default for Microsoft SQL Server is :downcase, so that you can deal with lowercase identifiers in ruby and still have upcased identifiers in the database. That's the reason we currently grab the columns as an array and use Array#zip. I have an optimization to just use symbolize_keys => :true if no identifier_output_method is used, but it doesn't take effect by default. I realize that such a request is specific to Sequel, so I can understand if you don't want to do that. I'm just informing you now as if may affect your API design should you desire to support it.

metaskills commented 13 years ago

Interesting. Have you considered that doing an Array#zip like that is a decent performance hit because you are retooling the data per row? I'm optimistic that there can be something done to help you. Here are a few notes about how are results are optimized. Like MySQL2 does, these optimizations will hopefully push back up to ActiveRecord to speed things up there too.

Even if you allow all rows to come back as hashes with string keys, those keys are the same object_id for each row. So just like the symbolized versions, there is no extra GC for those string objects because they are shared. Did you also know about the cache rows query option too?

Let me see if I can come up with a patch that first let's you call fields without having to iterate over the results first. This would need to be done anyway since the work I saw Aaron working on in AR 3.1 required it anyway. We can then go from there.

jeremyevans commented 13 years ago

I know Array#zip is a performance hit, which is why that code is only used when there is an identifier_output_method for the dataset. With Mysql2, we actually only used the :symbolize_keys method, and the identifier_output_method is unsupported there. That's because MySQL defaults to lowercasing unquoted identifiers, so it's generally not needed.

If fields is available before each, and includes columns even if the rows are empty, that's good enough for Sequel. I already use the optimized :symbolize_keys method where it will work correctly. If someone wants absolute best performance and compatibility with tiny_tds and Sequel, they can write their own C extension. For example, I created sequel_pg to speed up sequel when used with pg.

metaskills commented 13 years ago

Nice, keep an eye out. And please let me know if I can close issue #7 while this gets done.

BTW, very glad your even using TinyTDS too with your project. Very interesting stuff. Looking forward to seeing how this plays out.

metaskills commented 13 years ago

OK, it took me 2 days to rework everything, but we are better off for it. Here is a pull from the README on the way TinyTds::Result#fields works.

A result object has a #fields accessor. It can be called before the result rows are iterated over. Even if no rows are returned, #fields will still return the column names you expected. Any SQL that does not return columned data will always return an empty array for #fields. It is important to remember that if you access the #fields before iterating over the results, the columns will always follow the default query option’s :symbolize_keys setting at the client’s level and will ignore the query options passed to each.

result = client.execute("USE [tinytds_test]")
result.fields # => []
result.do

result = client.execute("SELECT [id] FROM [datatypes]")
result.fields # => ["id"]
result.cancel
result = client.execute("SELECT [id] FROM [datatypes]")
result.each(:symbolize_keys => true)
result.fields # => [:id]

If this works out, I will release a 0.4.0 gem with all these changes.

jeremyevans commented 13 years ago

With these changes, the Sequel tiny_tds adapter works fine. Thanks for the quick response!

rails-sqlserver / tiny_tds

Result#fields empty if no results are returned, and not callable before #each #8