Closed stenlarsson closed 5 years ago
In theory, this fix doesn't fully address the problem, as jobs b
and d
could theoretically be launched at the same time with this change. It seems the only safe solution would be to submit the tasks while holding some form of lock.
parallel do
sequence { a=job { ... }; b=job { ... } }
sequence { c=job { ... }; d=job { ... } }
end
You're right, it doesn't work if you have sequences in your parallel. To summarise:
Your suggestion with locks sounds better.
Closing since I no longer use Rubydoop.
Submitting jobs in parallel is not thread safe. There is a "unique number generator" that is used when downloading files to the file cache. Unfortunately this only yields unique numbers for a single job, since each job has it's own LocalDistributedCacheManager.
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-common/2.2.0/org/apache/hadoop/mapred/LocalDistributedCacheManager.java#95
With this change all jobs are submitted sequentially. In a local environment I think it means the jobs are not run in parallel, but in a distributed environment they should.
I was unfortunately not able to run the integration tests, so I don't know if this really works.