Closed NickShahML closed 8 years ago
This might be more appropriate for the keras-users google group, but a few observations:
Most tasks like the one you describe end up reducing to a factorization of some co-occurrence matrix (either word-word in a context window (this gives word2vec) or a word-document across the whole thing (this gives a topic model)).
I would recommend trying something like latent dirichlet allocation before neural network methods. Here's a cool browser plugin that runs on state of the union addresses. http://mimno.infosci.cornell.edu/jsLDA/
Thank you @jmhessel , I will repost this on the keras users group. I appreciate the plugin.
Good to know about LDA in vs. neural net methods. Definitely going to check out that plugin!
can anyone implement RBM using keras and share the code
Hey everyone,
I've been trying to figure out a way to cluster words based upon similarity.
Suppose you read many books that total to 100k different words used. It would be great if you could make ~1000 clusters with approx 100 words/cluster. In each cluster, words are similar to each other. "Dog" and "Cat" in one cluster and "truck" and "car" in a different cluster.
I saw that there's the well-made skipgram word-embedding script example: https://github.com/fchollet/keras/blob/master/examples/skipgram_word_embeddings.py
And I also saw that word2vec has made word clusters: https://code.google.com/p/word2vec/
I know that they usually apply a k-means on top of of the word vectors created. I thought it would be good to start a discussion about this in case other keras-users are interested in the same thing.
Thanks!