s2t2 / learning-nlp-py

2 stars 11 forks source link

Learning NLP


Fork this repo and clone your forked copy onto your local machine, then navigate there from the command-line:

cd learning-nlp-py/

Create and/or activate a Python 3.7 virtual environment:

conda create -n learning-nlp-env python=3.7 # (first time only)
conda activate learning-nlp-env

Install package dependencies:

pip install -r requirements.txt # (first time only)

Download the data:

Download the spacy language models:

python -m spacy download en_core_web_md
python -m spacy download en_core_web_lg

Download NLTK data, like stopwords:


> import nltk
> nltk.download()
> nltk.download("stopwords")
> nltk.download("movie_reviews")


Run some example code:

# MOD 1:
python -m app.tokenizer

# MOD 2:
python -m app.vectorizer
python -m app.word_distances

# MOD 3:
python -m app.grid_searcher
python -m app.amzn_reviews_classifier
python -m app.imdb_reviews_classifier
python -m app.whiskey_reviews_classifier

# MOD 4:
python -m app.novels

Start working from scratch in your own clean space:

python -m app.playground # MOD 1
python -m app.playground2 # MOD 2
python -m app.playground3 # MOD 3
python -m app.playground4 # MOD 4


pip install pytest # (first time only)
# pytest --disable-pytest-warnings -s
# pytest test/parser_test.py --disable-pytest-warnings -s
# pytest test/parser_test.py --disable-pytest-warnings -s -k 'test_tokenize'