klbostee / dumbo

Python module that allows one to easily write and run Hadoop programs.
http://projects.dumbotics.com/dumbo
1.04k stars 146 forks source link

added gzip and bzip2 support for local (non-hadoop) dumbo job #77

Closed lucidfrontier45 closed 7 years ago

lucidfrontier45 commented 11 years ago

I added gzip and bzip2 file support for mapper input.

The current dumbo seems only support this kind of mapreduce in local mode. cat input | mapper | sort | reducer > output

What I added works like this. zcat input | mapper | sort | reducer > output bzcat input | mapper | sort | reducer > output