The block generation job has custom output logic to allow each reducer to output to multiple block files.
When speculative execution is enabled, this can result in two copies of the same block file being generated (one of which may be incomplete). This can be worked around by setting mapreduce.reduce.speculative = false.
When a reducer attempt fails, the partial output files will not be cleaned up. I'm not aware of an easy workaround for this beyond manually cleaning up the files after the job completes.
We should have each reducer use a staging directory and only move the output files when it completes.
The block generation job has custom output logic to allow each reducer to output to multiple block files.
When speculative execution is enabled, this can result in two copies of the same block file being generated (one of which may be incomplete). This can be worked around by setting
mapreduce.reduce.speculative = false
.When a reducer attempt fails, the partial output files will not be cleaned up. I'm not aware of an easy workaround for this beyond manually cleaning up the files after the job completes.
We should have each reducer use a staging directory and only move the output files when it completes.