stripe-archive / brushfire

Distributed decision tree ensemble learning in Scala
Other
391 stars 50 forks source link

Quick Start example in README. #95

Open eightysteele opened 8 years ago

eightysteele commented 8 years ago

This PR adds some love to the Quick Start example in README. :)

avibryant commented 8 years ago

This is great, thanks! I think the reason that hadoopClient was listed as provided is that when you are submitting the assembly jar to a hadoop cluster, the hadoop jars are indeed provided in that execution environment, and it can be problematic to duplicate them. But obviously that's not the case when running locally. I'm not sure what the best way to resolve this is.

eightysteele commented 8 years ago

Good catch!

The sbt-assembly docs offers a clue on how to resolve this.

If we add this to brushfire-scalding/build.sbt:

run in Compile <<= Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run))

runMain in Compile <<= Defaults.runMainTask(fullClasspath in Compile, runner in(Compile, run))

Then we can run it locally (with all the dependencies) like this:

$ sbt "brushfireScalding/runMain com.twitter.scalding.Tool com.stripe.brushfire.scalding.IrisJob --local --input example/iris.data --output example/iris.output"

Boom!

I wrapped the above command in a new script called quick-start, moved hadoopClient back to provided, and updated the README with examples of running locally and on the cluster.

There might be an even BETTER way to resolve this. Happy to pivot. Tell me what you think. :)

CLAassistant commented 4 years ago

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


eightysteele seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.