iconara / rubydoop

Write Hadoop jobs in JRuby
220 stars 33 forks source link

RFC: Improving tests #25

Closed grddev closed 9 years ago

grddev commented 10 years ago

The current integration test is more of a system test, and while unit tests aren’t really possible for the hadoop proxies, there needs to be some cheaper integration tests. As is, the integration tests are difficult to extend and since they take ages to run, it is not at all rewarding to do so. While adding these basic tests, I even realised that I had a copy-paste error in #23 (now fixed) that should really have been covered by a test.

Not exactly sure that this is the right approach, so any advice one making this whole thing nicer would be greatly appreciated.

The big problem with testing this whole thing is that there is no (good) way to get feedback from within the "proxied" world. The object is constructed from a class name only in a different JRuby runtime (with no arguments to the constructor) and there is no general way to talk back from the proxied runtime.

The approach in this pull request is to have a bunch of example implementations, and assert that they behave as intended (with a lot of glue to get around various imperfections in hadoop). I did the development on hadoop 2.2, so there might be some tweaks necessary to get the glue working on other versions, but I'd like to wait with that until I know the overall approach is sane.

It also seems my spec-load-path-hack also broke the integration tests everywhere (obvious in hindsight) , so ignore those for the time being.

Right now, the pull request is into custom-input-format-2 as a hack to make github display only the relevant changes. I don't really intend for this pull request to be merged anyhow, but rather serve as a basis for discussion for a subsequent pull request.

iconara commented 10 years ago

I think it's a great idea to get to a point where there's a cheaper test suite to run than the current one. I didn't understand quite what made this one run. From what I can tell it creates enough support structures to run a mapper, reducer, or whatever in something that looks more like Hadoop than just calling #map, #reduce or whatever directly – but I can't really tell what actuall kicks it off.

iconara commented 9 years ago

What's the status of this? Should we abandon it or go through it in person? I can't remember anything about it.

grddev commented 9 years ago

I would argue for merging this. I cleaned up the last work-in-progress commit a bit to make the separation of concerns between integration helpers and the proxy helpers a bit more clear.

Even if one could certainly do a lot more in terms of improved testing, I definitely think this is a step in the right direction.

iconara commented 9 years ago

Let's merge it.

grddev commented 9 years ago

Excellent. I will open a new pull request against master.