mraad / Shapefile

Java library to read point and polygon shape files
62 stars 26 forks source link

ConcurrentModificationException #6

Closed nickrobison closed 8 years ago

nickrobison commented 8 years ago

In trying to run a fairly simple spatial program with Cloudera Hadoop 5.5 (hadoop 2.6.0) on my Macbook Air. I'm getting a ConcurrentModificationException during the job setup phase (e.g. before the Mapper code executes).

My stacktrace looks like this: Exception in thread "main" java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at com.esri.mapreduce.AbstractInputFormat.listStatus(AbstractInputFormat.java:25) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:304) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:199) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325) at SpatialDifference.run(SpatialDifference.java:62) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at SpatialDifference.main(SpatialDifference.java:24)

I'm not sure where there would be concurrent concurrent modification of any of the members, is the library sensitive to a particular way of initializing the hadoop program? Currently, I'm using the Hadoop ToolRunner methods.

mraad commented 8 years ago

try https://github.com/mraad/ShpJob and I've updated the project to CDH5

nickrobison commented 8 years ago

The project you linked to does work, but only with the hadoop.mapred* api. When I modify the project to use the newer hadoop.mapreduce.* API, I get the concurrent modification exception.

nickrobison commented 8 years ago

I've opened a pull request on the ShpJob project with my port to the new mapreduce API.

https://github.com/mraad/ShpJob/pull/1

nickrobison commented 8 years ago

I think I've fixed the issue, the AbstractInputFormat class in the mapreduce package was iterating through a list of input files and modifying the list at the same time. I've changed the functionality to match the algorithm in the mapred package and opened a pull request #7 .

mraad commented 8 years ago

Updated the code - thanks and tested in ShpJob too !

nickrobison commented 8 years ago

Awesome! Thanks for the quick response on this.