mesos / storm

Storm on Mesos!
Apache License 2.0
139 stars 66 forks source link

support using WebHDFS to serve storm-mesos tarball #97

Open erikdw opened 8 years ago

erikdw commented 8 years ago

PR #57 was an attempt to support use of WebHDFS to fetch the storm-mesos tarball. However, the implementation in #57 was not ideal for non-WebHDFS use cases, since it avoids using the Mesos Fetcher, and thus prevents any benefits of caching, etc. that are added to the Fetcher.

As to why we need any special handling for using WebHDFS, it's a bit complicated as you can read here and here. Basically there are some deficiencies in Mesos and WebHDFS which prevent using the Mesos Fetcher to download a tarball from WebHDFS. The Mesos deficiencies have some tickets for them already, but I don't think a ticket exists yet for WebHDFS.

erikdw commented 8 years ago

Notably, Mesos v0.29.0 v1.0+ supports setting the URI's filename, since someone fixed MESOS-4735 (a bug I filed for solving #97).

So that should give us the ability to get this working! If the URI is webhdfs, we can do some parsing of it to set the CommandInfo.URI.filename to the bare foo.tar.gz name (or foo.tgz), and the Mesos fetcher should take care of unpacking it for us. Alternatively it can just be an explicit parameter in storm.yaml:

I think we can even put the code in now and have it just ignore this setting for Mesos pre-0.29.0. We would somehow need to see if the URI.CommandInfo can have a filename set. I imagine the protobuf generated code gives some ability to check if a setter is available.

erikdw commented 8 years ago

@echinthaka : FYI ^ we can now work on supporting WebHDFS URIs!

erikdw commented 8 years ago

Notably, in MESOS-5119 the field was changed to CommandInfo.URI.output_file and the semantics were adjusted a bit:

Add subdirectory support to URI.output_file field.

URI.output_file allows the user to specify the path of the file that'll
be saved in the sandbox when the URI is fetched, but previously it would
fail at fetch time if "filename" had a directory component. This change
allows users to specify a relative path for custom ouput targets within
the sandbox.

So we should also adjust the config parameter if we decide to go that route:

Notably, that is the name in mesos v1.0.0: