Crux is an analytics application built using HBase.
Crux development is happening in two branches, master and aggregation. The aggregation branch is the current action branch, where we are providing enhanced querying and reporting capabilities.
Crux has been Tested Against
- Cloudera's distribution CDH4
- Cloudera's distribution CDH3 - Hadoop 0.20.2-CDH3u5
- Apache HBase 0.92.1
- Apache HBase 0.90.3 on Apache Hadoop 0.20.2 with Hadoop append.
Crux features
- Aggregation of HBase data - min, max etc
- Functions like ceil, round, uppercase etc
- Advanced querying and filtering
- Drag and drop report designer
- Web based front end
- Support for various HBase versions
- Support for various datatypes
- and lots more..give it a try
Crux license
- Crux license is Apache License v2
Why HBase ?
- HBase provides a low latency columnar storage for big data. HBase fits perfectly with the Hadoop stack, using HDFS for storage and providing out of the box support for Map Reduce. Data can be ingested into HBase from a traditional Map Reduce application, Pig or Hive, Cascading, Flume, Scribe, Hiho or Sqoop. Data can also be imported using HBase bulk loader
Why Crux ?
- Once you have collected your data in HBase, there is a need to expose it to business and technical users of your organization.
The size of the data as well as the unstructured format makes it difficult to use a traditional reporting application with it. Crux uses native HBase integration to help you query your data. Crux has a web based report designer and viewer, making report creation and sharing easier. Crux comes with built in comparators for long, short, int, double, float, string and boolean datatypes and can create tables, graphs, scatter plots and other visuals for your data. Simple as well as composite rowkeys are supported via mapping Row Key Aliases. One can define filters on the row keys and perform get operations and range scans.Crux works with your schema and your data, there is no predefined schema for you to fit your data in.
Crux Design
-
Crux uses the HBase Java client API, which is a fully featured way to access HBase. There are other clients available for HBase, for example Rest, Thrift and Avro. At the time of writing Crux, these clients do not expose the complete conditional querying capability needed by Crux. Then there are batch clients like Map Reduce, Hive handler, Pig and Cascading. These are great for performing batch analysis using HBase data. However, a reporting application needs faster response time than the batch nature of these. Crux thus uses the HBase Java Client API.
-
Crux also uses MySQL to store the mapping of HBase schemas, connections and reports. The front end is built using Ajax, Dojo, Struts, with Hibernate. Crux uses open source software and comes with Apache License.
Crux Mailing List, Issue Reporting and Support
Crux Documentation and User Guide
- Crux features, guides and news is available at http://nubetech.co/category/crux-2. Besides this, Crux has an extensive inbuilt guide per page to help you create your reports effortlessly.The mailing list is also a good source of information about Crux.
Using Crux
Once you have the prerequisite
a. Create database for crux in MySQL
mysql>create databse crux;
mysql>use crux;
Create schema by running crux/db/schema.sql file in MySQL prompt,
mysql>source ${CRUX_HOME}/db/schema.sql
This creates the schema required for saving the report definitions.
b. Build crux(See instructions to build crux with Maven). Or download the tar appropriate for your HBase version from the github downloads link.
c. Copy crux.jar to ${HBASE_HOME}/lib or edit ${HBASE_HOME}/conf/hbase-env.sh and add the jars location to the file.
For example,
# Extra Java CLASSPATH elements Optional
export HBASE_CLASSPATH=
export HBASE_CLASSPATH="/home/crux/target/crux.jar"
Restart hbase
Go to Hbase home/bin and then enter start-hbase.sh
$ HBASE_HOME/bin/start-hbase.sh
Then start hbase shell.
$ HBASE_HOME/bin/hbase shell
This is needed as Crux has built in filters which work on the server side to select the data you choose.
d. Drop the war in tomcat/webapps and start tomcat by going to tomcat home/bin and enter startup.sh
$apache-tomcat-home/bin/startup.sh
Alternatively, just run
CRUX_HOME$ mvn jetty:run
e. Go to http://localhost:8080/crux and define your connection, mapping and report.
Instructions to build Crux with Maven
- Update hibernate.properties(crux/) with your MySQL host, port, dbname, testDbName, user and password.
- Download struts2-fullhibernatecore-plugin-2.2.2-GA.jar from http://code.google.com/p/full-hibernate-plugin-for-struts2/downloads/detail?name=struts2-fullhibernatecore-plugin-2.2.2-GA.jar&can=2&q=and add to your local repository by executing command given below.
mvn install:install-file -DgroupId=com.google.code -DartifactId=struts2-fullhibernatecore-plugin -Dversion=2.2.2-GA
-Dpackaging=jar -Dfile=${PATH_TO_struts2-fullhibernatecore-plugin-2.2.2-GA.jar}
- Crux can be built against HBase 0.90.3(default), HBase 0.90.6 or against HBase 0.92.1. Crux artifacts crux.war and crux.jar are created in crux/target/
To build and create war against 0.90.3
Go to the base directory where pom.xml is located and enter
mvn install -DskipTests
(in order to skip tests) or
mvn install
to run tests and create war
For CDH4
CRUX_HOME$ mvn -Dcdh4 install
For CDH3
CRUX_HOME$mvn -Dcdh3 install
Instructions to run test cases of Crux with Maven
CRUX_HOME$ mvn test
(For running tests against 0.92.1, set umask 0022 and run tests against hbase0.92 profile)
Instructions to set up the dev environment in Eclipse
Crux Limitations
- Crux is an HBase application, so the schema and the querying has to be designed accordingly.
- As far as possible, try to create row filters with equals/greater than equals/less than
so as to leverage HBase's Get and Range Scan operations.
Sample data
- Crux comes with sample data - you can refer testData/BseStock/README.txt for downloading BSE stock data for given list of scrips and populating hbase with it.