Some jobs require multiple files to be created where the file name is derived
from the data. Something like this
... -> transform [$.filepart, $.data]
-> write( dataDrivenFd(baseFd('/foo/*.dat')) )
where baseFd produces a FileOutputFormat, and dataDrivenFd will replace the '*'
with $[0] and write the data from $[1] of each element. There are some tricky
issues with the number of open files, the same file being written on different
nodes (map/reduce output would require ./part-##### files), the OutputCommitter
would need to be special etc. We should treat the case of partitioned and
grouped fileparts specially so we open only one file at a time and don't
require the part-##### files.
Original issue reported on code.google.com by Kevin.Be...@gmail.com on 7 Jul 2010 at 12:51
Original issue reported on code.google.com by
Kevin.Be...@gmail.com
on 7 Jul 2010 at 12:51