sequenceiq / hadoop-docker

Hadoop docker image
https://registry.hub.docker.com/u/sequenceiq/hadoop-docker/
Apache License 2.0
1.21k stars 560 forks source link

Problems with webhdfs #50

Open iaroslav-ai opened 8 years ago

iaroslav-ai commented 8 years ago

So far I was not able to use webhdfs with docker version of hadoop [on Ubuntu]. Here is what I tried:

1) Add a text file at user/root/f.txt :

curl -i -X PUT -T f.txt "http://172.17.0.2:50070/webhdfs/v1/user/root/f.txt?op=CREATE&user.name=root&overwrite=true"

2) Try reading contents of the file from hdfs:

curl -i -L "http://172.17.0.2:50070/webhdfs/v1/user/root/f.txt?op=OPEN&user.name=root"

For which I get

{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File /user/root/f.txt not found."}}

I tried using 3 different python libraries for webhdfs, but none work either. All of them stop with message similar to

Max retries exceeded with url: /webhdfs/v1/example_dir/example.txt?op=CREATE&user.name=root&namenoderpcaddress=d85d3582cf58:9000&overwrite=false
Failed to establish a new connection: [Errno -2] Name or service not known

when trying to create a file or folder. I also tried rebuilding the docker image to account for port 9000 not exposed, but that did seem to help. Am I doing something utterly wrong? I expect this to be likely given that I am a total had00p n00b :)

yeiniel commented 7 years ago

I was trying to use webhdfs too and found a problem. In my case the problem is that webhdfs redirect to a data node every time i try to write to a file. And it seems that the redirection URL use the internal machine name of the docker image (something like a65ec753065c). Any idea about this?

The following is an example request:

curl -i -X PUT -T ~/Downloads/JEA_BLOWER_DEFINITION.csv "http://localhost:50070/webhdfs/v1/user/root/f.txt?op=CREATE&user.name=root&overwrite=true"
HTTP/1.1 100 Continue

HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Wed, 19 Oct 2016 03:45:02 GMT
Date: Wed, 19 Oct 2016 03:45:02 GMT
Pragma: no-cache
Expires: Wed, 19 Oct 2016 03:45:02 GMT
Date: Wed, 19 Oct 2016 03:45:02 GMT
Pragma: no-cache
Set-Cookie: hadoop.auth="u=root&p=root&t=simple&e=1476884702571&s=n+WgHqacT3Q5OthGXHXPBtD2YlQ="; Path=/; Expires=Wed, 19-Oct-2016 13:45:02 GMT; HttpOnly
Location: http://a65ec753065c:50075/webhdfs/v1/user/root/f.txt?op=CREATE&user.name=root&namenoderpcaddress=a65ec753065c:9000&overwrite=true
Content-Type: application/octet-stream
Content-Length: 0
Server: Jetty(6.1.26)
ericjang96 commented 7 years ago

I am having the same issue as above, will it be addressed soon?

pierorex commented 6 years ago

I also have the same problem, already tried all python libraries available. Did anyone solve this with a magical workaround?

PhilipMourdjis commented 5 years ago

Not sure how this would translate if using docker-compose but can get this to work using: docker run -h localhost -p 50070:50070 -p 50075:50075 <<Container_Name>>

deryrahman commented 5 years ago

@PhilipMourdjis if you're using docker-compose you can put hostname localhost like this:

hadoop:
  image: <image_name>
  hostname: localhost
  ports:
    - 50070:50070
    - 50075:50075
g10guang commented 5 years ago

Just follow the redirect message Location

zakicheung commented 5 years ago

Notice:Step 2: Submit another HTTP PUT request using the URL in the Location header (or the returned response in case you specified noredirect) with the file data to be written. FYI Link