idio / wiki2vec

Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby
601 stars 137 forks source link

Help with determining window size and min count #22

Open vondiplo opened 8 years ago

vondiplo commented 8 years ago

I'm trying train a model that would include certain topics. Relying on the default parameters somehow keeps the topic out of the model. I was thinking of changing the window size to 5 and the min count to 5 to get more granular results. However, I don't seem to actually know what would be the effect of changing these parameters. Could someone please shed some light regarding the impact ?

dav009 commented 8 years ago
vondiplo commented 8 years ago
dav009 commented 8 years ago

can you add some info on how many annotations do the topics you are looking for have? I think the stats on annotations is available here: http://spotlight.sztaki.hu/downloads/latest_data/

presumably if it is returnable by spotlight it means it has enough counts

vondiplo commented 8 years ago

@dav009 - It doesn't appear there at all (I've downloaded the english tar, then performed a grap on the unzipped folder with my searched concept, 'visual_cortex'). Yet on the other hand neither does Barak_Obama, but it surely does appear in both spotlight's annotations and in wik2vec vectors.

vondiplo commented 8 years ago

Hi @dav009, is there any update regarding this?

dav009 commented 8 years ago

Sorry for the late reply,