klbostee / dumbo

Python module that allows one to easily write and run Hadoop programs.
http://projects.dumbotics.com/dumbo
1.04k stars 146 forks source link

unittest framework for dumbo mapreduce #21

Closed klbostee closed 13 years ago

klbostee commented 14 years ago

Adam wrote a small module, inspired by cloudera's MRUnit, that lets one easily create unit-tests for dumbo mapreduce tasks.

The code is currently at:

http://github.com/adamhadani/dumbo/blob/master/dumbo/mapredtest.py

He also added a unittest for it that serves the double purpose of unit-testing the unit-testing itself as well as serving as example of how to work with it:

http://github.com/adamhadani/dumbo/blob/master/tests/testmapredtest.py

The nice thing about it is that it takes care of some things behind the scenes, e.g deriving the mapper/reducer classes from mapredbase when needed, making sure input/output is iterable (allowing for arbitrarily large input/output test cases - need not fit in memory as seems to be the case with MRUnit), and so on.

klbostee commented 13 years ago

Just merged this into my master branch.