samjmolyneux / eppi-text-classification

Classifying papers by their abstracts and titles.
2 stars 0 forks source link

Experiment with faster lemmatization/word processing #32

Open samjmolyneux opened 1 month ago

samjmolyneux commented 1 month ago

Try atleast just adding a different method for tokenization to see how well it works, how fast it is, and what the difference in the number of total tokens ends up being once you get to the tfidf vectorizer.