weka511 / nlp

My experiments with Natural Language Processing. I've created a few programs to try out concepts.
GNU General Public License v3.0
1 stars 0 forks source link
machine-learning natural-language-processing python3 word2vec

NLP

My experiments with Natural Language Processing

Code

File Description
bt.py Explore variability of Bradley-Terry
cluster.py Find clusters in word vectors
corpora.py Library for reading text from corpora in various formats
plot.py Plot learning curves for a single corpus
rnn.py Sean Robertsons's NLP demo: Classifying Names
rnn2.py Sean Robertsons's NLP demo: Generating Names with a Character-Level RNN
seq2seq.py Sean Robertsons's NLP demo: Translation with a Sequence to Sequence Network and Attention
skipgram.py This program has been written to test my understanding of word2vec. It includes code for training the weights explicitly, using a hand-coded stochastic gradient optimizer and loss function.
template.py Template for new code with command line inerface
template-test.py Template for new code using python unittest
tfidf.py Implementation of td-idf algorithm
tfidf-harness.py Test harness for td-idf algorithm
tokenizer.py Prepare text for processing
transformer.py An attempt to understand Transformers, based on Arun Mohan's demo
word2vec.py This program has been written to test my understanding of word2vec. The code was originally based on Mateusz Bednarski's article--Implementing word2vec in PyTorch
word2vec2.py Test harness for skipgram.py. It builds examples, trains weights, and some test code.
nlp.wpr Wing IDE Project file

Data folder

File Description
64317-0.txt The Great Gatsby
blogs.zip The Blog Authorship Corpus
gatsby*.txt Chapters of The Great Gatsby used for training
unigram_freq.csv Most frequent 333,333 word in English after Rachel Tatman