Use word embeddings instead of tokenization for text extraction

mozilla / webcompat-ml

Webcompat machine learning models

Mozilla Public License 2.0

4 stars 3 forks source link

Use word embeddings instead of tokenization for text extraction #4

Open johngian opened 5 years ago

johngian commented 5 years ago

Currently we are using a simple CountVectorizer for extracting features from text. It might be a good idea to experiment with using word embeddings instead. The rationaly behind this is that count vectorization doesn't factor in the context of the text where there might be semantic context important for the triaging model.