ondra-m / ruby-spark

Ruby wrapper for Apache Spark
MIT License
227 stars 29 forks source link

Encoding::UndefinedConversionError: "\x8B" from ASCII-8BIT to UTF-8 #27

Open alex-silentale opened 8 years ago

alex-silentale commented 8 years ago
2.1.3 :036 > rdd2 = sc.parallelize(["jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj"])
Encoding::UndefinedConversionError: "\x8B" from ASCII-8BIT to UTF-8
    from bundler/gems/ruby-spark-2287d5a71670/lib/spark/ext/io.rb:37:in `write'
    from bundler/gems/ruby-spark-2287d5a71670/lib/spark/ext/io.rb:37:in `write_int'
    from bundler/gems/ruby-spark-2287d5a71670/lib/spark/ext/io.rb:48:in `write_string'
    from bundler/gems/ruby-spark-2287d5a71670/lib/spark/serializer/batched.rb:53:in `block in dump_to_io'
    from bundler/gems/ruby-spark-2287d5a71670/lib/spark/serializer/batched.rb:51:in `each_slice'
    from bundler/gems/ruby-spark-2287d5a71670/lib/spark/serializer/batched.rb:51:in `each'
    from bundler/gems/ruby-spark-2287d5a71670/lib/spark/serializer/batched.rb:51:in `dump_to_io'
    from bundler/gems/ruby-spark-2287d5a71670/lib/spark/context.rb:215:in `parallelize'
    from (irb):36
    from bundler/gems/railties-3.2.13/lib/rails/commands/console.rb:47:in `start'
    from bundler/gems/railties-3.2.13/lib/rails/commands/console.rb:8:in `start'
    from  bundler/gems/railties-3.2.13/lib/rails/commands.rb:41:in `<top (required)>'
    from script/rails:6:in `require'
    from script/rails:6:in `<main>'

I Also tried Oj as a serializer, and get the same error. It seems to be coming from IO or StringIO

ondra-m commented 8 years ago

What serializer are you using?

alex-silentale commented 8 years ago

@ondra-m I've tried marshal and oj

Coolnesss commented 7 years ago

Getting the same error but with floats instead of strings. Tried both Marshal and oj. My data is kinda big, so it's hard to debug. Any ideas?

To reproduce:

sc.parallelize [LabeledPoint.new(1, [1,2]), LabeledPoint.new(3, [1,6])]
# => Encoding::UndefinedConversionError: "\xDF" from ASCII-8BIT to UTF-8
from /home/chang/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/packable-1.3.8/lib/packable/extensions/io.rb:62:in `write'
ondra-m commented 7 years ago

It works for me.

1) What is your ruby spark version Spark::VERSION? My is 1.2.1

2) Please post full backtrace.