yoonkim / CNN_sentence

CNNs for sentence classification
2.05k stars 827 forks source link

set sentence max length automatically #28

Open pexmar opened 7 years ago

pexmar commented 7 years ago

If you are using a dataset with sentences longer than 65 words, you have to set the the max_l variable manually. You can fix this little issue by replacing the second last line in process_data.py with:

cPickle.dump([revs, W, W2, word_idx_map, vocab, max_l], open("mr.p", "wb"))

and in conv_net_sentence.py after loading the pickled file:

revs, W, W2, word_idx_map, vocab, max_l = x[0], x[1], x[2], x[3], x[4], x[5]

Now you only have to replace the make_idx_data_cv function call by: make_idx_data_cv(revs, word_idx_map, i, max_l=max_l, k=300, filter_h=5)

It drove me crazy finding that the max sentence length limitation was the problem for the error

ValueError: setting an array element with a sequence.

yoonkim commented 7 years ago

cool, feel free to send a pull request!