Closed brisssou closed 13 years ago
Sounds like using -input code
for the local run might fix your problem. (And, iirc, using that option should work on hadoop as well.)
Closing this issue now but feel free to reopen if I misunderstood your problem.
Hello! Let me explain that one: I have two jobs creating there own output files. Then I want to merge those two files using a third job.
In my first attempt, the first two jobs were yielding python structures (
dict
) as values, and unicode strings as key, which turned out to be dumb. I would have had toeval
the keys and values in my third job. I'm not sure anyone would want to do that.Now I try to output pure strings through
encode('utf-8')
and somejson.dumps
. I now have string every where,dumbo cat
confirmed it.But, if I try to use those two files as input for the third merge job, keys and values are single-quoted, which if quite a pain to test my code locally. Of course, I will use
dumbo cat out > out.txt
to be able to test the merge job locally, but the code driving those three jobs won't be testable unless ran on a real hadoop cluster.Did I miss something?
Thanks a lot for your help!