spring-attic / spring-hadoop-samples

Spring Hadoop Samples
Apache License 2.0
492 stars 466 forks source link

Hadoop Spring mapreduce multiple inputs and mappers in a job #23

Open honeyc0der opened 9 years ago

honeyc0der commented 9 years ago

How to specify multiple input files and their respective format in a Job tag?

<?xml version="1.0" encoding="UTF-8"?>

<beans:beans xmlns="http://www.springframework.org/schema/hadoop" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:beans="http://www.springframework.org/schema/beans" xmlns:context="http://www.springframework.org/schema/context" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd"&gt;

<context:property-placeholder location="hadoop.properties"/>

<configuration> fs.default.name=${hd.fs} yarn.resourcemanager.address=${hd.rm} mapreduce.framework.name=${mr.fw}
</configuration>

<job id="wordcountJob" input-path="${wordcount.input.path}" output-path="${wordcount.output.path}" mapper="org.apache.hadoop.examples.WordCount.TokenizerMapper" reducer="org.apache.hadoop.examples.WordCount.IntSumReducer"/>

</beans:beans>

As we can a specify in java program. Like we this.

MultipleInputs.addInputPath(job, firstPath, FirstInputFormat.class, FirstMap.class); MultipleInputs.addInputPath(job, sencondPath, SecondInputFormat.class, SecondMap.class);

I goggled a lot even i checked its xsd file. I did not find any attribute so how can we specify multiple inputs in a job?