martinprobson / vagrant-hadoop-hive-spark

Vagrant project to spin up a single node VM running current versions of Hadoop, Hive and Spark
Apache License 2.0
67 stars 59 forks source link

Thank you for providing already configured spark&hadoop vms) #26

Open homomorfism opened 2 years ago

homomorfism commented 2 years ago

Hello, thank you for helping me launching my examples job using spark-submit, it saved a lot of time wrile writing and launching scala jobs!

Btw, I did not succeeded in starting hdfs, and I took commands from this repository (https://github.com/s3u/vagrant-hadoop-spark). It would be cool if you specify in README.md how to start hdfs and some example job.

Also, it would be great if you pre-install in VM nano, tree, zip and may be sdk (https://sdkman.io/) for running java/scala jobs.

homomorfism commented 2 years ago

By the way, I can add instructions on how to launch hdfs and small tutorial on how to write small scala job and run it with spark-submit

martinprobson commented 2 years ago

Hello, thank you for your comments. The Hadoop daemons - HDFS (the namenode and datanode) as well as yarn (resourcemanager, nodemanager and proxyserver) and the Map reduce history server are all started by this script in the repo. You are right, I should probably add some instructions on how to do this manually in the README. If you want to add nano (I'm a vi user :-) ) , tree, zip and the sdkman you could amend This function which does some apt-get installs already. It would be great if you could submit a pull request to add instructions on HDFS and example job.

Many thanks!

martinprobson commented 2 years ago

BTW, I am a bit tied up with work at present, but will action this (and update the repo to use a more recent version of hadoop/spark) when I get the chance.