Closed tiriplicamihai closed 8 years ago
Bag of words for the next milestone of the project sounds good to me.
" given a website from history classify it." I think this would be too much responsibility. We should have a second module that takes urls and returns a dict with key website and value content sans the stop words. This module will also be used to build the training data. This classifier should use the module.
This is done.
Using the data collected in #3 we should train a classifier. In this version it can be a dummy one based on the bag of words or smth like this. This module will also receive queries - given a website from history classify it.
The methods for opening a link and getting the text from it should be available from #3.