Open kottmann opened 11 years ago
Subclassing the Mapper.Context class does not seem to work for both version, since in each version the constructor is different, but the subclass has to call the super constructor. Its probably possible to solve this somehow via reflection but that would not be an elegant solution.
The MapSequence seems to run multiple Mappers during a single Mapper invocation from the framework. It looks like that the purpose of that is to run multiple transformations on the input data in a pipeline without running multiple MapReduce Jobs.
Would it be possible to make these transformations without implementing the Mapper interface, and therefore eliminating the need to subclass Mapper.Context?
The purpose of the MapSequence is to do in-memory mapping when you have a chain of mappers in a row. I would have to think about how to remove the need to subclass Mapper.Context. If you have an idea and can provide a pull request, that would be most appreciated.
On Feb 19, 2013, at 7:32 AM, Joern Kottmann notifications@github.com wrote:
Subclassing the Mapper.Context class does not seem to work for both version, since in each version the constructor is different, but the subclass has to call the super constructor. Its probably possible to solve this somehow via reflection but that would not be an elegant solution.
The MapSequence seems to run multiple Mappers during a single Mapper invocation from the framework. It looks like that the purpose of that is to run multiple transformations on the input data in a pipeline without running multiple MapReduce Jobs.
Would it be possible to make these transformations without implementing the Mapper interface, and therefore eliminating the need to subclass Mapper.Context?
— Reply to this email directly or view it on GitHub.
Ok, so we can have something like this: Mapper1 | Mapper2 | Mapper3 | Reducer.
What do you think about ChainMapper to set up the Mappers? As far as I can see it is available in both versions.
ChainMapper JavaDoc: http://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/mapred/lib/ChainMapper.html
ChainMapper doesn't work for mapreduce library -- only mapred. Hence the reason I created MapSequence :(.
On Feb 19, 2013, at 10:14 AM, Joern Kottmann notifications@github.com wrote:
Ok, so we can have something like this: Mapper1 | Mapper2 | Mapper3 | Reducer.
What do you think about ChainMapper to set up the Mappers? As far as I can see it is available in both versions.
ChainMapper JavaDoc: http://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/mapred/lib/ChainMapper.html
— Reply to this email directly or view it on GitHub.
Hi all, I managed to get Faunus working with CDH 4.2. There were roughly two sets of changes I had to make :
org.apache.hadoop.mapreduce.MapContext
to implementing an interface of the same name. At the same time, the code that resided in org.apache.hadoop.mapreduce.MapContext
(in Hadoop 1.1) was moved to org.apache.hadoop.mapreduce.task.MapContexImpl
(in 2.x). The simplest way to get around these changes is to reimplement MemoryMapper.MemoryMapContext
to encapsulate org.apache.hadoop.mapreduce.MapContextImpl
and simply pass all function calls to the member variable.
The right answer for Hadoop 2/ CDH compatibility is probably to create another build profile. MRUnit ( https://github.com/apache/mrunit/blob/trunk/pom.xml) does this particularly effectively. If there is an interest in going this route, or other suggestions on how to create a build that works with both versions, I would be happy to volunteer my time to implement
For now, I've forked Faunus and implemented the fixes. The fork can be found at https://github.com/karkumar/faunus and the fix is in the cdh4-port branch.
Thanks again!
Hey guys, so I just did the update to 0.4.0 snapshot. Apache Hadoop 2 compatibility. The only change I had to make was to change instances of TaskAttemptContext to TaskAttemptContextImpl. Again the fork can be found at https://github.com/karkumar/faunus and the fix is in the cdh4-port branch.
The problem with that (correct me if I'm wrong) is that TaskAttemptContextImpl does NOT work with Hadoop 1.y.z. Hadoop 2 has not seen a stable release yet. Until Apache Hadoop goes 2.0-stable, then we are going to stick with 1.y.z API.
If you can figure out how to make it 2.0 AND 1.y.z compatible, I would definitely make that change immediately.
On May 2, 2013, at 3:35 PM, Karthik Ramachandran notifications@github.com wrote:
Hey guys, so I just did the update to 0.4.0 snapshot. Apache Hadoop 2 compatibility. The only change I had to make was change instances of TaskAttemptContext to TaskAttemptContextImpl. Again the fork can be found at https://github.com/karkumar/faunus and the fix is in the cdh4-port branch.
— Reply to this email directly or view it on GitHub.
Yup, thats correct. That said, there are only two classes that are really preventing Faunus from being Hadoop 2 compatible : MemoryMapper.MemoryMapContext and TaskAttemptContext. Really the only change between Hadoop 1 and Hadoop 2 is that these clases became abstract and their implementations were moved to impl classes in alternate packages.
So you could just package your own version of MapContext and TaskAttemptContext in Faunus -- literally cut and paste them out of the Hadoop 1.y.z code base into Faunus--and then you should be able to run against either 1.y.z or 2.0. The solution isn't elegant, but it will probably work.
You would probably also want to add a maven build profile that changes the Hadoop and MRUnit artifacts to the 2.0 artifacts.
If you'd like I can experiment with this change in my fork. If not, the over head of keeping my fork up to date is fairly minor, so I can keep doing that and updating this ticket.
Thanks for taking the time to think about my request, it's much appreciated.
Yup, thats correct. That said, there are only two classes that are really preventing Faunus from being Hadoop 2 compatible : MemoryMapper.MemoryMapContext and TaskAttemptContext. Really the only change between Hadoop 1 and Hadoop 2 is that these clases became abstract and their implementations were moved to impl classes in alternate packages.
Gotcha. So you could just package your own version of MapContext and TaskAttemptContext in Faunus -- literally cut and paste them out of the Hadoop 1.y.z code base into Faunus--and then you should be able to run against either 1.y.z or 2.0. The solution isn't elegant, but it will probably work.
That is an interesting idea…………..hmmmmm. I will think on that for faunus04.
You would probably also want to add a maven build profile that changes the Hadoop and MRUnit artifacts to the 2.0 artifacts.
Ah. Yea, thats a problem there.
If you'd like I can experiment with this change in my fork. If not, the over head of keeping my fork up to date is fairly minor, so I can keep doing that and updating this ticket.
Please.
Thanks for taking the time to think about my request, it's much appreciated.
Thank you for your interest.
Hi,
I think I have found a reasonable solution to this incompatibility, which allows for us to generate hadoop 1 compatible and hadoop 2 compatible binaries from the same code base.
There are three different parts to the fix:
If we stop with steps 1 and 2 we create a faunus jars that should be compatible with Hadoop 1 and Hadoop 2. However, the distribution that is created is only compatible with Hadoop 1 because it includes the wrong Hadoop jars in the lib directory. So I created a build profile:
These changes can be found in the proxying-port branch of my fork : https://github.com/karkumar/faunus/tree/proxying-port
This solution isn't ideal because it imposes a small cost on every context.write. However, if we clean up the proxy object a bit that cost should be relatively minor.
Let me know if this is a viable option, I would be happy to spend some more time cleaning this up and testing.
Right now, you should be able to build and run all the tests agains either build profile. All tests should pass. I've also tested the code against my Hadoop 2 cluster and it seems to work. I haven't tested agains a Hadoop 1 cluster.
Again, thanks for taking the time to think about my request.
I just wanted to revisit this issue in light of the recent release of 2.1.0. In 2.1.0 they claim that there is now source compatibility for jobs that use Hadoop 1.x Mapreduce APIs and Hadoop 2.0 (http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html)
Has anyone had a chance to check try this out with Faunus ? Is there any interest in creating multiple build profiles for Faunus? One that builds again Hadoop 1.0 jars and on that builds agains Hadoop 2.1 jars?
Thanks
Thanks for your work. I've used your MemoryMapper code and have created a Hadoop2 branch of Faunus: https://github.com/thinkaurelius/faunus/tree/hadoop2
We have a Cloudera CDH4 cluster and run into a compatibility issue with Faunus, CDH4 is based on Hadoop 2.x instead of Hadoop 1.x.
The Mapper.Context constructor signature changed and causes a NoSuchMethodError when called from Faunus. Here is the stack trace: java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.Mapper$Context.(Lorg/apache/hadoop/mapreduce/Mapper;Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/mapreduce/TaskAttemptID;Lorg/apache/hadoop/mapreduce/RecordReader;Lorg/apache/hadoop/mapreduce/RecordWriter;Lorg/apache/hadoop/mapreduce/OutputCommitter;Lorg/apache/hadoop/mapreduce/StatusReporter;Lorg/apache/hadoop/mapreduce/InputSplit;)V
at com.thinkaurelius.faunus.mapreduce.MemoryMapper$MemoryMapContext.(MemoryMapper.java:32)
at com.thinkaurelius.faunus.mapreduce.MapSequence$Map.setup(MapSequence.java:30)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:138)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:263)
We are getting this error even so we are running MRv1. The error above is the first one we got, there might be more compatibility issues.
The issue was first reported in the aureliusgraphs google group: https://groups.google.com/forum/#!topic/aureliusgraphs/B3gvUWOQ2cA