klbostee / dumbo

Python module that allows one to easily write and run Hadoop programs.
http://projects.dumbotics.com/dumbo
1.04k stars 146 forks source link

Fix #62 Optional path argument in JoinMapper #65

Open a4tunado opened 11 years ago

a4tunado commented 11 years ago

Now to get source path from the mapper routine just add **kwargs to the arguments list. Here are some examples.

@dumbo.decor.primary
def map_primary(key, value, **kwargs):
  key, value = value.strip().split('\t')
  print >> sys.stderr, key, value, kwargs['path']
  yield key, value

Or you can specify desired argument directly

@dumbo.decor.primary
def map_primary(key, value, path, **kwargs):
  key, value = value.strip().split('\t')
  print >> sys.stderr, key, value, path
  yield key, value

Callable instances are also supported

@dumbo.decor.secondary
class MapSecondary(object):
  def __call__(self, key, value, path, **kwargs):
    key, value = value.strip().split(' ')
    print >> sys.stderr, value, path
    yield key, value

And previous mapper interface is working aswell

@dumbo.decor.primary
def map_primary(key, value):
  key, value = value.strip().split('\t')
  yield key, value

This approach allows easily extend interface to pass other arguments in the future

klbostee commented 11 years ago

Sounds good! Will try to find some time to review and merge this soonish.