Open dcu opened 9 years ago
any update on this one?
I had the same issue. I just monkey patched it to remove the invalid k,v from the obj. I replaced the mosql
binary with the following, which I call monkey-patched-mosql
. Then, I run the ETL process from the following code, which modifies the MoSQL::Schema.transform
method. It could be cleaned up by using a super
.
The ETL errors from my data were caused by binary values and larger than expected BSON documents.
#!/usr/bin/env ruby
require 'mosql/cli'
module MoSQL
class Schema
def transform(ns, obj, schema=nil, depth = 0)
schema ||= find_ns!(ns)
original = obj
# Do a deep clone, because we're potentially going to be
# mutating embedded objects.
obj = BSON.deserialize(BSON.serialize(obj))
row = []
schema[:columns].each do |col|
source = col[:source]
type = col[:type]
if source.start_with?("$")
v = fetch_special_source(obj, source, original)
else
v = fetch_and_delete_dotted(obj, source)
case v
when Hash
v = JSON.dump(Hash[v.map { |k,v| [k, transform_primitive(v)] }])
when Array
v = v.map { |it| transform_primitive(it) }
if col[:array_type]
v = Sequel.pg_array(v, col[:array_type])
else
v = JSON.dump(v)
end
else
v = transform_primitive(v, type)
end
end
row << v
end
if schema[:meta][:extra_props]
extra = sanitize(obj)
row << JSON.dump(extra)
end
log.debug { "Transformed: #{row.inspect}" }
row
rescue BSON::InvalidStringEncoding, BSON::InvalidDocument
obj = obj.select do |k,v|
begin
BSON.deserialize(BSON.serialize({"#{k}" => v}))
true
rescue BSON::InvalidStringEncoding, BSON::InvalidDocument
puts "Pruning #{k} from the hash."
false
end
end
raise "tried and failed to prune with #{[ns, obj, schema]}" if depth > 2
transform(ns, obj, schema, depth + 1)
end
end
end
MoSQL::CLI.run(ARGV)
+1 - anyone know what would cause this? I checked the timestamp that it appears to be failing on and I don't see any issues
looks like there was a PR open to resolve this here: #83 which broke tests.
I have the following exception when importing a collection, the data should be valid since it is already present in the database.
Any ideas?
Please note this is failing even with the --unsafe flag.