Type in column metadata depends on the type of the first row in the result set

flash-gordon commented 8 years ago

Hi,

I use Sequel, it is Ruby SQL library which has the following code that process the result set (common for all JDBC adapters):

      def process_result_set(result)
        meta = result.getMetaData
        if fetch_size = opts[:fetch_size]
          result.setFetchSize(fetch_size)
        end
        cols = []
        i = 0
        convert = convert_types?
        map = convert ? db.type_convertor_map : db.basic_type_convertor_map

        meta.getColumnCount.times do
          i += 1
          cols << [output_identifier(meta.getColumnLabel(i)), i, convert ? type_convertor(map, meta, meta.getColumnType(i), i) : basic_type_convertor(map, meta, meta.getColumnType(i), i)]
        end
        self.columns = cols.map{|c| c.at(0)}

        while result.next
          row = {}
          cols.each do |n, j, pr|
            row[n] = pr.call(result, j)
          end
          yield row
        end
      ensure
        result.close
      end

What it does is querying a metadata from the RS and using it to fetch & typecast values from the RS. Unfortunately sqlite-jdbc relies on the type of the first row in providing the metadata values. More of it from my point of view it returns inconsistent information about data type. Have a look:

[1] pry(#<#<Class:0x64712be>>)> meta.getColumnType(i)
=> 4
[2] pry(#<#<Class:0x64712be>>)> meta.getColumnTypeName(i)
=> "DECIMAL"
[3] pry(#<#<Class:0x64712be>>)> Java::JavaSQL::Types::INTEGER
=> 4
[4] pry(#<#<Class:0x64712be>>)>

^ this means in spite the fact that data type named "DECIMAL" sqlite-jdbc returns constant value of "INTEGER" data type.

All this leads to a really weird behavior:

@db.create_table(:items){Decimal :a}
@db[:items].insert(100)
@db[:items].insert(100.1)
@db[:items].select_map(:a) # => [100, 100.1]

@db.create_table(:items){Decimal :a}
@db[:items].insert(100.1)
@db[:items].insert(100)
@db[:items].select_map(:a) # => [#<BigDecimal:64f857e7,'0.1001E3',4(8)>, #<BigDecimal:73511076,'0.1E3',1(4)>]

Depending on which rows came the first sqlite-jdbc provides different metadata results based on its source. I see that this behavior is intentional but why sqlite-jdbc does so?

flash-gordon commented 8 years ago

@jeremyevans maybe you will be interested in this

jeremyevans commented 8 years ago

@flash-gordon I think this may be an issue with Sequel, not sqlite-jdbc. Unless you can reproduce your issue using sqlite-jdbc without using Sequel, I would probably close this.

The reason behind Sequel's behavior is any sane SQL database will use the same type of all values of the same column, and Sequel uses this in an optimization. Unfortunately, SQLite is not sane in this case, see https://www.sqlite.org/datatype3.html. I suppose Sequel could use a completely different approach to typecasting on jdbc-sqlite than it uses for other jdbc drivers, but that would also negatively affect the performance for most Sequel sqlite-jdbc users. Note that you can fix your issues by making sure you convert types correctly before inserting data (Sequel::Model handles this for you).

flash-gordon commented 8 years ago

I see your point, Jeremy. But before taking any actions we must determine what is the ground truth here. Despite I worked with sqlite-jdbc through Sequel I dug straight into process_result_set and looked at the behavior of ResultSet object directly. SQLite allows you to get metadata of a result set and you always can get a type affinity from it, that's why getColumnTypeName returns just what was expected, it does not have any magic underneath. Having inconsistent results for getColumnType and getColumnTypeName is really weird from my point of view. Of course Sequel can work around this, I just want to be sure that this is really necessary. Comparing adapters for MRI (sqlite gem) and for JRuby makes me think it would be like rewriting from scratch or so.

The reason of the issue is failing specs in rom-sql gem under JRuby 9K. I don't work with SQLite personally, only Oracle and PG at the moment :)

sithmein commented 8 years ago

The problem is even worse: when your write a double with e.g. NaN or -Inf it gets inserted correctly. However, when you determine the type of that column you get INTEGER (4) and when you then try to read the NaN it fails because it cannot be converted into an integer. Any application that relies on correct column types returned by the metadata simply doesn't work at all with SQLite. Which is really unfortunate.

gitblit commented 8 years ago

PRs with proposed solutions & unit tests are always welcome.

viking commented 6 years ago

This still occurs for me when the column type is not well defined. SQLite is very permissive about column types. Any invalid column type defaults to NUMERIC type (see Determination of Column Affinity). If I create a SQLite table with a column of type junk, for example, SQLite does not recognize it and treats it as a NUMERIC type. Since SQLite also doesn't care about what type of data is in each column (regardless of type), this issue is harder to recognize.

Here's an example using the Sequel library in Ruby:

Sequel.connect("jdbc:sqlite::memory:") do |database|
  database.create_table(:foo) do
    column(:bar, String)
    column(:baz, :junk)
  end
  dataset = database[:foo]
  dataset.insert({ bar: "123", baz: "456" })
  dataset.insert({ bar: "bar", baz: "baz" })
  p dataset.all #=> [{ bar: "123", baz: 456 }, { bar: "bar", baz: "baz" }]
end

Column bar is created as the Ruby type String, which translates to varchar(255) in Sequel. SQLite recognizes this as a TEXT type. However, the column baz is created as type junk, which is not a valid SQLite type and defaults to NUMERIC according to the column affinity rules. In this situation it seems that the value type is based on each individual value instead of the column type.

jeremyevans commented 6 years ago

@viking I don't think your code example shows a bug in jdbc-sqlite, as you get the same behavior using the C-based sqlite adapter in Sequel.

This issue is that jdbc-sqlite uses the value type for the of the first row when considering what type to return, instead of just looking at the type name (see example in original post). I have a feeling changing jdbc-sqlite to not consider the value type of the first row would cause backwards compatibility issues, though.

viking commented 6 years ago

@jeremyevans If it only looked at the first row, wouldn't the first record in the example be { bar: 123, baz: 456 } instead of { bar: "123", baz: 456 }?

jeremyevans commented 6 years ago

@viking jdbc-sqlite considers both the value type of the first row and the column type. See code example linked from the original post. Also, how SQLite will store the value depends on the column affinity (see the link in your first comment). Since the column affinity of the bar column is TEXT, the value for bar will be stored as text, not integer.

xerial / sqlite-jdbc

Type in column metadata depends on the type of the first row in the result set #112