Closed nickrobison closed 8 years ago
try https://github.com/mraad/ShpJob and I've updated the project to CDH5
The project you linked to does work, but only with the hadoop.mapred* api. When I modify the project to use the newer hadoop.mapreduce.* API, I get the concurrent modification exception.
I've opened a pull request on the ShpJob project with my port to the new mapreduce API.
I think I've fixed the issue, the AbstractInputFormat class in the mapreduce package was iterating through a list of input files and modifying the list at the same time. I've changed the functionality to match the algorithm in the mapred package and opened a pull request #7 .
Updated the code - thanks and tested in ShpJob too !
Awesome! Thanks for the quick response on this.
In trying to run a fairly simple spatial program with Cloudera Hadoop 5.5 (hadoop 2.6.0) on my Macbook Air. I'm getting a ConcurrentModificationException during the job setup phase (e.g. before the Mapper code executes).
My stacktrace looks like this:
Exception in thread "main" java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at com.esri.mapreduce.AbstractInputFormat.listStatus(AbstractInputFormat.java:25) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:304) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:199) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325) at SpatialDifference.run(SpatialDifference.java:62) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at SpatialDifference.main(SpatialDifference.java:24)
I'm not sure where there would be concurrent concurrent modification of any of the members, is the library sensitive to a particular way of initializing the hadoop program? Currently, I'm using the Hadoop ToolRunner methods.