idio / wiki2vec

Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby
601 stars 137 forks source link

Add post-processing for cleaner output corpora, support for entity min_count in gensim model creation, redirects resolution #8

Closed phdowling closed 6 years ago

phdowling commented 9 years ago

These are some changes I made that seem to improve corpus and model quality. I also added a script that converts the vectors to a csv format that can be used by Spotlight to create a vector store.

dav009 commented 8 years ago

looks good :+1: sorry for the late comment. could you please squash some of the commits (for example: local fixes x 3)

phdowling commented 8 years ago

Ah, this is awful. I tried to squash and rebase, it seems to have just added more commits now

edit: looking better now. @dav009 feel free to review again and merge when ready

dav009 commented 8 years ago

oops, git rebase -i HEAD~20 <- squashing.

phdowling commented 8 years ago

Is it looking okay now?

phdowling commented 7 years ago

@dav009 Just came across this again, any reason you don't want to merge? I think this improves the quality of the data a fair bit

tgalery commented 7 years ago

hi @phdowling @dav009 is in Japan atm, this repo is kind of not very well maintained as we might re-write it from scratch. That being said, I'll take a look at it when I have time, and probably merge.