lnxchk / HadoopCookbook

temporary parking for my update to the Chef Hadoop cookbook
3 stars 0 forks source link

Hadoop Cookbook

This cookbook is a work in progress. It's essentially the second version I've put together, after learning a bit about what sorts of bad assumptions I was making with our clusters and how they had been set up. I still have some things on my radar, like monitoring and quotas on the datanodes.

The changes I've made here are exclusively for rpm-based systems. I plan to offer these changes to the maintainer of the HadoopCluster cookbook to expand that cookbook, since it currently only supports debian and ubuntu, and simply roll them together. I haven't worked off that version since it's significantly different from the environment I've been working on this in.

The templates are not exhaustively complete config files, but I have included links to the hadoop documentation for all options. From a functional standpoint, the main components are here, and would hopefully only require minor changes to get running in any given environment.

I've tried to keep the recipes clean from the standpoint of being able to run multiple clusters with this same cookbook, setting up the attributes necessary for new clusters. I do have a ToDo to look at cleaning that part up and using the environments in a smarter way or potentially putting things in a databag.

I have my own list of open issues in github for this project. Feel free to comment, add new ones, close, or whatever.

Some additional points:

Default recipe

Apache_hadoop recipe

Namenode recipe

Jobtracker recipe

Worker recipe

Hadoop_user recipe

work to do