studhadoop / MapSide-Join-In-order

1 stars 2 forks source link

Order of rows #1

Open Shailuc opened 8 years ago

Shailuc commented 8 years ago

Hey Hi,

Thanks so much for the great response, actually the code works fine with a small file but as the size of rows increases the order of rows id shuffled ex:

1 Anne,Admin,50000,A 10 Gokul,Admin,50000,B 11 Janet,Sales,60000,A 12 Hari,Admin,50000,C 13 Anne,Admin,50000,A 14 Gokul,Admin,50000,B 15 Janet,Sales,60000,A 16 Hari,Admin,50000,C 17 Anne,Admin,50000,A 18 Gokul,Admin,50000,B 19 Janet,Sales,60000,A 2 Gokul,Admin,50000,B 20 Hari,Admin,50000,C 21 Anne,Admin,50000,A 22 Gokul,Admin,50000,B 23 Janet,Sales,60000,A 24 Hari,Admin,50000,C 3 Janet,Sales,60000,A 4 Hari,Admin,50000,C 5 Anne,Admin,50000,A 6 Gokul,Admin,50000,B 7 Janet,Sales,60000,A 8 Hari,Admin,50000,C 9 Anne,Admin,50000,A

It doesn't show then sequentially, Please provide some inputs also I want to customise the key i.e the key column will dynamic for different files.

Thanks Shailu

studhadoop commented 8 years ago

Normally keys from Reducer will be in ascending order. The issue with this code is for key we are using Text(). Could you please try to incorporate TextPair as IntWritable instead of Text?.Will upload that solution.(check TwoValueWritable).Use TwoValueWritable instead of TextPair.

Shailuc commented 8 years ago

Hi,

 I am trying to pass a dynamic column ID for key, I am trying to do it using context but if I pass context as parameter for mapper it gives error for the method signature i.e public void map(LongWritable key, Text value,
OutputCollector<TextPair, Text> output, Reporter reporter, Context context)
throws IOException

I want to spend the column index for key could you please provide some inputs, its a bit urgent.

studhadoop commented 8 years ago

did u tried using TwoValueWritable class? If u use TowValueWritable as key to emit u need to change your signature to public void map(LongWritable key, Text value, OutputCollector< TwoValueWritable , Text> output, Reporter reporter, Context context)

Shailuc commented 8 years ago

I am vey new to mapreduce not aware of many things :) I tried it with Text and Context only I will try it with TwoValueWritable check

Shailuc commented 8 years ago

Hi I want to split the rows based on "," intead of "\t" I have changed the values in both Mappers and TextPair class but my output clollector still collect it as tab separated ex

ID Nm1,Nm2,G,Z 1 Anne,Admin,GR,Z1 2 Gokul,Admin,GJ,Z2 3 Janet,Sales,GH,Z3 4 Hari,Admin,G3,Z4

I want the entire file to be , separated because I need to store that file as CSV , what changes should I make for this code.

studhadoop commented 8 years ago

ok fine. understood. "output.collect(new IntWritable(key.getFirst()), outValue);" This line in reducer gives only tab seperated value. In reducer if we are emiting key and value these will be always seperated by tab. In order to achieve to your's case 1: we can just emit value with , seperator. output.collect(new NullWritable.get(), key.getFirst() +","+outValue) case 2 : You can change the outputting delimiter http://unmeshasreeveni.blogspot.in/2014/04/can-we-change-default-key-value-output.html

Try out and see.

Shailuc commented 8 years ago

Thanks so much Unmesha for such a great support :) I will try this.

studhadoop commented 8 years ago

You are always welcome. Let me know if it is successfull or not.