xldrx / cloudapp-mp2

Machine Programming Assignment for Cloud Application Course
Apache License 2.0
3 stars 68 forks source link

Improvements #3

Open macfreek opened 9 years ago

macfreek commented 9 years ago

Hello Hadi,

I just took some time to isolate the public improvements from the assignments, and pushed them to a fork. Feel free to merge or cherry-pick whatever you think is useful.

Commit Change
47f967a Properly delete output file from HDFS. If it is not deleted, a second run of the program raises a Java exception.
a9b10d2 Ignore internal_use/tmp directory
4b860bf Remove Python -u option. This did not work on Hortonworks VM for me.
365d5ae Use modern toolchain: hdfs and yarn instead of hadoop
bbeddcc Don’t hardcode paths in Java code. This failed on a local Hadoop cluster I have access to.
d9611bc Workarounds to work on Mac OS X: (1) Use md5 (from Openssl) instead of md5sum (from GNU coreutils) is unavailable. (2) Enforce Java 1.7 output (To be more correct, I should also set the bootstrap class path).
0aae948 Improved run.sh script: (1) Check if settings.sh is sourced. (2) -v option (verbose) show commands as they are executed. (3) Allow selective execution of assignments. E.g. run.sh A D to run assignments A and D. The default is to run them all.

Regards, Freek

PS: it is shocking to see how many people push their assignment solutions to a public repository, thus violating the third clause of the Coursera Honour Code!

PS2: I just realized that none of these commits have a sign-off line. In case you appreciate it: Signed-off-by: freek@macfreek.nl. A.k.a.: yeah, I really am to blame me for any bug introduced by these commits ;).

xldrx commented 9 years ago

Thanks Freek, Nice work. I will merge these issues in the master as soon as course is ended.