Closed jmesnil closed 13 years ago
It's a limitation. Dumbo's local mode only relies on UNIX pipes and doesn't use Hadoop in any way, so specifying a java class as input format for a local run simply cannot work. If you want to test Hadoop helper classes locally, you have to locally install a Hadoop build that is configured to run in local mode (which is the default configuration).
hi,
I want to run Dumbo with a specific input format (to read from Avro files). It seems Dumbo does not use the input format specified by '-inputformat' when it is run locally (without specifying '-hadoop'). Instead it uses its default input format.
To check that, I specify a unknown class with '-inputformat foo.bar.UnknownClass'. It fails on hadoop but passes in local mode.
Hadoop mode:
$ dumbo start cat.py \ -input word-count.avro \ -output tmp \ -libjar avro-1.4.1.jar \ -libjar avro-utils-1.5.3-SNAPSHOT.jar \ -inputformat foo.bar.UnknownClass \ -python /home/sites/sci-env/0.0.5/bin/python \ -hadoop /usr/lib/hadoop ... -inputformat : class not found : foo.bar.UnknownClass Streaming Command Failed!
Local mode:
$ dumbo start cat.py \ -input word-count.avro \ -output tmp \ -libjar avro-1.4.1.jar \ -libjar avro-utils-1.5.3-SNAPSHOT.jar \ -inputformat foo.bar.UnknownClass \ -python /home/sites/sci-env/0.0.5/bin/python INFO: buffersize = 168960
=> no error, tmp was created but it contains the content of the binary avro file as it was read as text...
Is it a limitation of Dumbo that the '-input' format is working only in Hadoop mode or is it a bug?
thanks, jeff