Some comments/feedback on your data_cleaning.ipynb notebook:
Make sure function names are consistent for example getAdjectives() should be modified to get_adjectives()
You should save your processed/cleaned data to the data/processed/ directory
Then split your notebook into multiple notebooks
data_cleaning.ipynb -> To clean the data and save the final processed dataset to disk
modeling.ipynb -> Load the data from disk and build models.
It might be good to look at the precision vs recall curve as well as the ROC curve. You might not see a huge difference since your dataset is already balanced. But there's no harm in looking at that too.
You can also save the word cloud to an img/ directory
Make sure you push the .py version of the code as well.
See if you can find any patterns in the False Positive and False Negative examples. Are there any patterns that the model consistently gets wrong?
Have you tried playing around with your model to see what it can or cannot predict correctly. For example, type a few sentences you expect the model to predict as depressive or non-depressive and see what the model predictions are. And see if you can observe any patterns.
Great work so far!
Some comments/feedback on your
data_cleaning.ipynb
notebook:getAdjectives()
should be modified toget_adjectives()
data/processed/
directoryimg/
directory.py
version of the code as well.