spring-attic / spring-hadoop-samples

Spring Hadoop Samples
Apache License 2.0
492 stars 466 forks source link

Issue while writing data to hdfs from remote client #14

Open ajjadrapc opened 10 years ago

ajjadrapc commented 10 years ago

Hi, I'm trying out the spring for hadoop sample provided to write data to HDFS running on Amazon EC2 cluster from my local machine(windows-from eclipse). From the documentation provided here http://docs.spring.io/spring-hadoop/docs/1.0.x/reference/html/appendix-amazon-emr.html, I have created a SOCKS proxy using the below command ssh -i kp1.pem -ND 6666 ubuntu@ec2-54-191-18-136.us-west-2.compute.amazonaws.com

and then tried to connect to remote cluster. but ,it gives me the below exception ec2

Also, as per the information in the blog(http://blog.cloudera.com/blog/2008/12/securing-a-hadoop-cluster-through-a-gateway/), I have set up the below properties in core-site.xml on client side and on server side, I have made the property "hadoop.rpc.socket.factory.class.default" final.

hadoop.socks.server localhost:6666 hadoop.rpc.socket.factory.class.default org.apache.hadoop.net.SocksSocketFactory

I'm using hadoop-2.4.0 and in all the hadoop related configuration files, I have used the amazon public DNS name as the hostname both on the client and the server side. For example,

fs.default.name hdfs://ec2-54-191-18-136.us-west-2.compute.amazonaws.com:8020

Can you please let me know the reason why I get the attached error?

trisberg commented 10 years ago

Looks like a connectivity issue. What do you have in src/main/resources/application.properties? I could try accessing this from my system - let me know if it is ok for me to access your AWS instance.

ajjadrapc commented 10 years ago

Yes, you can access the AWS instance. I have included the resources core-site.xml, hdfs-site.xml and mapred-site.xml to hadoop configuration object. please find the attached images of the configuration files. please let me know if you need any further information core-site mapred-site hadoop-context

ajjadrapc commented 10 years ago

Hi, Just wanted to tell you that the AWS instance is charging me as I was done with my free tier. so, I would like to keep it running till 3:30ET. It would be great if you can help me before that.

Thanks in advance and sorry about that

trisberg commented 10 years ago

Took a quick peek yesterday but didn't have enough time to test it. From what I could tell the datanodes seemed to use an AWS internal address rather than a public address which could be the cause. What does hostname return on your VM?

ajjadrapc commented 10 years ago

Hi Thomas, Thanks for the response.Yes, I see that namenode is giving EC2 internal IP address for datanodes and that's the reason, I changed my approach to use SOCKS proxy configuration. hostname gives "triconnode173" on my local machine.