omkardash / jaql

Automatically exported from code.google.com/p/jaql
0 stars 0 forks source link

Data-driven output file names #93

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Some jobs require multiple files to be created where the file name is derived 
from the data.  Something like this

... -> transform [$.filepart, $.data]
    -> write( dataDrivenFd(baseFd('/foo/*.dat')) )

where baseFd produces a FileOutputFormat, and dataDrivenFd will replace the '*' 
with $[0] and write the data from $[1] of each element.  There are some tricky 
issues with the number of open files, the same file being written on different 
nodes (map/reduce output would require ./part-##### files), the OutputCommitter 
would need to be special etc.  We should treat the case of partitioned and 
grouped fileparts specially so we open only one file at a time and don't 
require the part-##### files. 

Original issue reported on code.google.com by Kevin.Be...@gmail.com on 7 Jul 2010 at 12:51