jubos / fake-s3

A lightweight server clone of Amazon S3 that simulates most of the commands supported by S3 with minimal dependencies
2.94k stars 355 forks source link

Spark DataFrame save to Parquet cannot put file into fake S3 #108

Open iinegve opened 9 years ago

iinegve commented 9 years ago

I'm working with Spark and try to save DataFrame into Parquet files. It seems it's able to create buckets, directories, list them, but when it tries to create a file, then server throws an exception into console.

Here is how I installed fake-s3:

$ sudo gem install fakes3
Password:
Fetching: thor-0.19.1.gem (100%)
Successfully installed thor-0.19.1
Fetching: builder-3.2.2.gem (100%)
Successfully installed builder-3.2.2
Fetching: fakes3-0.2.1.gem (100%)
Successfully installed fakes3-0.2.1
Parsing documentation for thor-0.19.1
Installing ri documentation for thor-0.19.1
Parsing documentation for builder-3.2.2
Installing ri documentation for builder-3.2.2
Parsing documentation for fakes3-0.2.1
Installing ri documentation for fakes3-0.2.1
3 gems installed

I have updated my /etc/hosts 127.0.0.1 with my_bucket.localhost

Here are the first messages server prints out on start:

$ fakes3 -r ~/tmp/fakes3_root -p 4569 -H localhost
Loading FakeS3 with /Users/emorozov/tmp/fakes3_root on port 4569 with hostname localhost
[2015-06-25 12:48:24] INFO  WEBrick 1.3.1
[2015-06-25 12:48:24] INFO  ruby 2.0.0 (2014-02-24) [universal.x86_64-darwin13]
[2015-06-25 12:48:24] INFO  WEBrick::HTTPServer#start: pid=90416 port=4569

Finally the exception:

[2015-06-25 13:12:39] ERROR Errno::ENOENT: No such file or directory - /Users/emorozov/tmp/fakes3_root/reltio%2Ftmp6%2F_temporary%2F0%2F_temporary%2Fattempt_201506251311_0024_r_000000_0%2Fpart-r-00001.parquet/.fakes3_metadataFFF/metadata
    /Library/Ruby/Gems/2.0.0/gems/fakes3-0.2.1/lib/fakes3/file_store.rb:109:in `initialize'
    /Library/Ruby/Gems/2.0.0/gems/fakes3-0.2.1/lib/fakes3/file_store.rb:109:in `open'
    /Library/Ruby/Gems/2.0.0/gems/fakes3-0.2.1/lib/fakes3/file_store.rb:109:in `copy_object'
    /Library/Ruby/Gems/2.0.0/gems/fakes3-0.2.1/lib/fakes3/server.rb:176:in `do_PUT'
    /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/webrick/httpservlet/abstract.rb:106:in `service'
    /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/webrick/httpserver.rb:138:in `service'
    /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/webrick/httpserver.rb:94:in `run'
    /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/webrick/server.rb:295:in `block in start_thread'

localhost - - [25/Jun/2015:13:12:39 FET] "PUT /tmp6%2F_temporary%2F0%2Ftask_201506251311_0024_r_000000%2Fpart-r-00001.parquet HTTP/1.1" 500 496
- -> /tmp6%2F_temporary%2F0%2Ftask_201506251311_0024_r_000000%2Fpart-r-00001.parquet
darkjh commented 8 years ago

@fathersson Hi just wondering how you setup the s3 endpoint in spark/hadoop?

iinegve commented 8 years ago

@darkjh can't tell you anything useful - it's been a while ago.