titipata / yelp_dataset_challenge

Play around with Yelp dataset in Python (in progress and very messy repo)
http://www.yelp.com/dataset_challenge
19 stars 6 forks source link

Test simple model of review generation based on word2vec #7

Closed daniel-acuna closed 8 years ago

daniel-acuna commented 8 years ago

This simple model would work as follows:

  1. For each sentence in the dataset, compute the average word2vec of words contained in the sentence.
  2. Construct nearest neighbor structure to allow search of sentence that are close in word2vec

For generating reviews:

  1. Sample initial sentence from data
  2. Compute the mean word2vec of that sentence
  3. Compute the nearest neighbor of that sentence
  4. Sample the next sentence by using the distance in word2vec as a probability distribution
  5. Repeat.
daniel-acuna commented 8 years ago

Closed by 7bede85afcfda68f24f12bbb52e35e1353ed8e8b