ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.25k stars 592 forks source link

DOCS: Indicate which ports should be used for both HDFS and Impala connections #766

Closed wesm closed 7 years ago

wesm commented 8 years ago

And maybe where to find them in the various web UIs

bendichter commented 8 years ago

I'm struggling with this right now. I am trying to connect but all I get is

TTransportException: Could not connect to ec2-##-###-###-##.us-west-2.compute.amazonaws.com:21050: unknown error (_ssl.c:2826)

I can get to Hue through ssh, but I don't know how to translate that command into these connections. A little more documentation here might really help.

wesm commented 8 years ago

Are you able to make a HiveServer2 (or Beeswax via the impala-shell) connection to this server from anyplace? Are you able to reach the Impala web UI? I agree that instructions about finding the right ports through the web UI is necessary as it can be tricky if the ports are not the standard ones.

bendichter commented 8 years ago

not sure about HiveServer2 or Beeswax, but I am able to reach the Impala web UI with this:

ssh -i ~/file.pem -L 8889:10.0.0.12:8888 ec2-user@ec2-##-###-###-##.us-west-2.compute.amazonaws.com
wesm commented 8 years ago

I believe you need to set up SSH tunnels to the Impala and HDFS ports to be able to use Ibis on your local machine. So in Python you would be connecting to localhost and using the tunneled ports. I'm not an expert in this but found some helpful instructions here

http://blog.trackets.com/2014/05/17/ssh-tunnel-local-and-remote-port-forwarding-explained-with-examples.html

I would personally create Python function that opens the SSH tunnels with subprocess.Popen (and makes it easy to close them all from Python). If you create a complete solution I'm happy to add it to the documentation since using Impala on AWS is pretty common (even though this is not an Impala/Ibis-specific issue)

bendichter commented 8 years ago

Oh, I see! I thought Ibis did the ssh as well as the Impala interfacing. This is really helpful, thank you. I'll let you know if I come up with a good solution.

cpcloud commented 7 years ago

We should at least link to this page in the docs: https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_ports_cdh5.html#topic_9_1

I found this very useful when setting up the docker image that we use for testing.