Closed endersonmaia closed 8 years ago
The tinytds adapter doesn't support returning blob data as Sequel::SQL::Blob
, which means that they won't be literalized as a blob. You need to manually convert the blob values from String
to Sequel::SQL::Blob
instances before inserting them. I'm sorry there isn't an easier way, if tiny_tds adds support for custom type converters, I'll add support for it in Sequel.
I don't think the problem is with TinyTDS, since I can print the binary output, and since the data inside this field is a gziped data, I can even get it's content.
I think it's more like a PostgreSQL adapter problem, that's is complaining for UNICODE, when it should just put the bytes inside the bytea
field.
As you can see, I can get the data form Sequel with TinyTDS fine :
irb(main):001:0> require 'sequel'
=> true
irb(main):002:0> require 'zlib'
=> true
irb(main):003:0> DB_src = Sequel.connect('tinytds://HOST/redmine')
=> #<Sequel::TinyTDS::Database: "tinytds://HOST/redmine">
irb(main):004:0> src_wcv = DB_src[:wiki_content_versions]
=> #<Sequel::TinyTDS::Dataset: "SELECT * FROM [WIKI_CONTENT_VERSIONS]">
irb(main):008:0> src_wcv.first[:data]
=> "x\xDA\xCB0\xD4S\b\xCF\xCC\xCE\x04\x00\n)\x02|"
irb(main):009:0> Zlib::Inflate.inflate(src_wcv.first[:data])
=> "h1. Wiki"
What is src_wcv.first[:data].class
? If it is String
, there's your problem. I needs to be Sequel::SQL::Blob
in order for PostgreSQL to escape it as a bytea instead of a String. I didn't say this was a problem with tiny_tds per se, just that tiny_tds does not expose an interface that allows customization of the type converters, so Sequel doesn't have a way to automatically convert such columns to use Sequel::SQL::Blob
.
Ok, now I get it. Solved calling Sequel::SQL::Blob.new()
on each row for the :data
field, before calling the <<
.
Thanks!
Thanks gentleman for ping me. TinyTDS will likely never allow custom casting at the C level to another type. If a binary column is returned, we give you a binary encoded string. Our goal is to return a ruby primitive for each matching SQL Server type. We want other gems like the SQL Server adapter to take the work from there. Hope this helps.
That's understandable, but a shame, as it forces callers to post-process the output at a significant speed hit if they want custom types.
It depends. Most raw connections when I last looked returned strings. TinyTDS like MySQL casting to a matching Ruby primitive in C made it really fast. I mean really fast. So is it fair to say "significant" speed hit? Maybe for large binary data? But maybe I misunderstand the problem. If you have some time:
Sequel::SQL::Blob
desirable?This could be my ignorance on the project. Just hoping to get my head around what this request was and learn a bit.
Post processing all output from TinyTDS is a significant speed hit. It is true that doing the casting in C is faster than doing the casting in ruby, though. The best of both worlds would be to accept ruby procs for casting certain data types (e.g. https://github.com/jeremyevans/sequel_pg), and use those if given.
Note that Sequel::SQL::Blob
is a subclass of String
, it's a separate subclass because most databases require blobs to be escaped differently than regular strings. Doing this based on the encoding is a bad idea, as just because a string has a binary encoding does not mean it should be escaped as a blob.
Obviously, using Sequel::SQL::Blob
as a default in TinyTDS would not make sense. Hence the desire for an interface where users of TinyTDS could setup custom type casting per connection.
Thanks @jeremyevans that helps build my context up. Is Sequel heavily opinionated on the raw connection proc casting and/or does it provide type conversion when low level connections like TinyTDS do not provide that feature? Just wondering if there was a pull request opportunity to made the TinyTDS adapter better for Sequel. TIA!
In general, when a driver does low level casting but does not support casting of specific types (such as blobs), Sequel chooses the faster approach of not overriding the low level casting. This is because most users would prefer things be faster, and it is not fair to burden them if they are not using those specific types.
In cases where the driver provides strings, and Sequel does the typecasting itself, then Sequel handle's the specific types if it is possible, since there is in general not a performance hit for doing so.
An example API for this would be something like:
c = TinyTds::Client.new
c.type_convertors['blob'] = proc{|v| ...}
I'm not sure if that particular API would work for TinyTDS, it probably depends on FreeTDS's internal APIs. For example, PostgreSQL and MySQL use type numbers instead of type names.
@endersonmaia
Could you please provide the final script which actually worked for you ?
Hey @djay87
#!/usr/bin/env ruby
require 'sequel'
DB_src = Sequel.connect('tinytds://user:password@host/database')
src_wcv = DB_src[:wiki_content_versions]
DB_dst = Sequel.connect('postgres://user:password@host/database')
dst_wcv = DB_dst[:wiki_content_versions]
src_wcv.each do |src|
src[:data] = Sequel::SQL::Blob.new(src[:data])
dst_wcv << src
end
Hi, I'm migrating the data from SQL Server to PostgreSQL. I used the [dump]:(https://github.com/toy/dump), that worked almost 100%. I got an error with the
bytea
type in PostgresSQL.I created this simple script to make the transfer where
dump
failed.I get this error
I think since the filed is of type
bytea
, it shouldn't complain about UTF8 byte sequence.Fetching the data via
irb
, the output is this :I told the same here toy/dump#21.
What am I missing ?