This project is a rough prototype that I've written to analyze large collections of JSON documents and discover their Apache Hive schema. I've used it to anaylyze the githubarchive.org's log data.
To build the project, use Maven (3.0.x) from http://maven.apache.org/.
Building the jar:
% mvn package
Run the program:
% bin/find-json-schema *.json.gz
I've uploaded the discovered schema for githubarchive.org to https://gist.github.com/omalley/5125691.