My experiments with Natural Language Processing
File | Description |
---|---|
bt.py | Explore variability of Bradley-Terry |
cluster.py | Find clusters in word vectors |
corpora.py | Library for reading text from corpora in various formats |
plot.py | Plot learning curves for a single corpus |
rnn.py | Sean Robertsons's NLP demo: Classifying Names |
rnn2.py | Sean Robertsons's NLP demo: Generating Names with a Character-Level RNN |
seq2seq.py | Sean Robertsons's NLP demo: Translation with a Sequence to Sequence Network and Attention |
skipgram.py | This program has been written to test my understanding of word2vec. It includes code for training the weights explicitly, using a hand-coded stochastic gradient optimizer and loss function. |
template.py | Template for new code with command line inerface |
template-test.py | Template for new code using python unittest |
tfidf.py | Implementation of td-idf algorithm |
tfidf-harness.py | Test harness for td-idf algorithm |
tokenizer.py | Prepare text for processing |
transformer.py | An attempt to understand Transformers, based on Arun Mohan's demo |
word2vec.py | This program has been written to test my understanding of word2vec. The code was originally based on Mateusz Bednarski's article--Implementing word2vec in PyTorch |
word2vec2.py | Test harness for skipgram.py. It builds examples, trains weights, and some test code. |
nlp.wpr | Wing IDE Project file |
File | Description |
---|---|
64317-0.txt | The Great Gatsby |
blogs.zip | The Blog Authorship Corpus |
gatsby*.txt | Chapters of The Great Gatsby used for training |
unigram_freq.csv | Most frequent 333,333 word in English after Rachel Tatman |