idio / wiki2vec

Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby
601 stars 137 forks source link

Support jsonpedia as input source #24

Open keynmol opened 8 years ago

keynmol commented 8 years ago

A lot of logic for cleaning wikipedia markup is already implemented in json-wikipedia and in general it's much easier to work with because annotations are explicitly specified separately from the text of the article.

We should add an option to use jsonpedia directly, without pre-processing the XML dump.