[x] Download a book text from the Gutenberg Project and copy it into HDFS
[x] Write a MapReduce program that does a WordCount on an arbitrary text file
[x] Package your WordCount application into a Maven project and build it
[x] Using teragen and terasort
Below are a few comments/suggestions:
You should fix your local Git user (as used in this commit)
Always post the full output of the commands used. You missed that one in your hdfs-tests.md file.
Your error handling (try-catch) does not add any value. You're just printing the stack trace, which would be also printed if you don't have it in at all. Use it when you're doing actual error handling, otherwise remove it.
There is no need to use class fields. They are only used within one method and can be made local variables.
Add the .gitignore as one of the first files when setting up a Git repository. This prevents adding unnecessary or bad files (such as *.iml) to your repository.
Open points:
Fix the label colors as described at the top of the exercise sheet.
Below is a short review.
Tasks
teragen
andterasort
Below are a few comments/suggestions:
hdfs-tests.md
file.try-catch
) does not add any value. You're just printing the stack trace, which would be also printed if you don't have it in at all. Use it when you're doing actual error handling, otherwise remove it..gitignore
as one of the first files when setting up a Git repository. This prevents adding unnecessary or bad files (such as*.iml
) to your repository.Open points:
Summary:
You're done with this one. Good work.