ogrisel / pignlproc

Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.
158 stars 64 forks source link

Hadoop 2 and AWS EMR 5 compatibility #15

Open jzonthemtn opened 7 years ago

jzonthemtn commented 7 years ago

This pull request adds Hadoop 2, AWS EMR 5 compatibility, and builds with Java 8. I have tested these changes on Java 8 and AWS EMR 5.2.1 with success. Updated some dependency versions and the Pig jar's scope is set to provided through a profile for AWS EMR. (I'm not a Pig expert but I removed the fields.isNull() lines due to changes in the newer Pig version. Everything worked ok but if changes are needed there let me know.)

Edit: I also changed the version number to differentiate between the Hadoop 1 and 2 jars when I was testing. I can revert that change if necessary.