jrae is a re-implemention of semi-supervised recursive autoencoder in java. This package also contains code to demonstrate its usage.
More details are available at http://www.socher.org/index.php/Main/Semi-SupervisedRecursiveAutoencodersForPredictingSentimentDistributions
Also read http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/ for a neat explanation on recursive deep representations.
In short, semi-supervised recursive autoencoder is a feature learning algorithm to learn an encoding for text data and that can then be used for performing classification. The jrae package is pretty comprehensive - it includes code for learning the features as well as for performing basic classification, and is parallelized to run on a multi-core machine.
The package includes a demo of movie review classification on which the algorithm attains state-of-art results. Please use rc3 for your experiments https://github.com/sancha/jrae/releases/tag/rc3, and use the master branch only for contributions. The master branch includes some unsupported code.
The core feature of the recursive autoencoder is to learn a representation of words and sentences. Google recently released a similar tool, you are encouraged to try out the word2vec project http://code.google.com/p/word2vec/
Stanford has an official code package integrated into Stanford CoreNLP, please check http://nlp.stanford.edu/sentiment/code.html for updates.
The RAE package requires the jblas package for supporting the linear algebra operations. These requirements are included in the lib directory.
Including the jblas jar file may not be sufficient. JBLAS requires either
LAPACK or ATLAS. Check out https://github.com/mikiobraun/jblas if you run
into trouble. If you are running ubuntu, do sudo apt-get install libgfortran3
.
If you encounter any bugs, please report it on github.