Natural Language Processing
Implementations of selected machine learning algorithms for natural language processing in golang. The primary focus for the package is the statistical semantics of plain-text documents supporting semantic analysis and retrieval of semantically similar documents.
Built upon the Gonum package for linear algebra and scientific computing with some inspiration taken from Python's scikit-learn and Gensim.
Check out the companion blog post or the Go documentation page for full usage and examples.
Features
Planned
- Expanded persistence support
- Stemming to treat words with common root as the same e.g. "go" and "going"
- Clustering algorithms e.g. Heirachical, K-means, etc.
- Classification algorithms e.g. SVM, KNN, random forest, etc.
References
- Rosario, Barbara. Latent Semantic Indexing: An overview. INFOSYS 240 Spring 2000
- Latent Semantic Analysis, a scholarpedia article on LSA written by Tom Landauer, one of the creators of LSA.
- Thomo, Alex. Latent Semantic Analysis (Tutorial).
- Latent Semantic Indexing. Standford NLP Course
- Charikar, Moses S. "Similarity Estimation Techniques from Rounding Algorithms" in Proceedings of the thiry-fourth annual ACM symposium on Theory of computing - STOC ’02, 2002, p. 380.
- M. Bawa, T. Condie, and P. Ganesan, “LSH forest: self-tuning indexes for similarity search,” Proc. 14th Int. Conf. World Wide Web - WWW ’05, p. 651, 2005.
- A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in High Dimensions via Hashing,” VLDB ’99 Proc. 25th Int. Conf. Very Large Data Bases, vol. 99, no. 1, pp. 518–529, 1999.
- Kanerva, Pentti, Kristoferson, Jan and Holst, Anders (2000). Random Indexing of Text Samples for Latent Semantic Analysis
- Rangan, Venkat. Discovery of Related Terms in a corpus using Reflective Random Indexing
- Vasuki, Vidya and Cohen, Trevor. Reflective random indexing for semi-automatic indexing of the biomedical literature
- QasemiZadeh, Behrang and Handschuh, Siegfried. Random Indexing Explained with High Probability
- Foulds, James; Boyles, Levi; Dubois, Christopher; Smyth, Padhraic; Welling, Max (2013). Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation