Closed zhq2009 closed 8 years ago
The relevant information offered by this package are the vectors generated from Wikipedia annotations. Wikipedia annotations corresponds to links users add to a wikipedia article referring to another article ( i.e: Linking to Barack Obama
in the article about US Politics
)
In that sense the file mentioned in the readme includes:
Tea
, Apple
, Obama
found in the wikiDBPEDIA_ID/<wikittitle>
) i.e:
DBPEDIA_ID/Barack_Obama
DBPEDIA_ID/Apple
(regardless if they are multiword or not)The wikipedia entity vectors for single words differ fro the vectors of wikipedia entites in the sense that an occurrence of DBPEDIA_ID/Barack_Obama
takes place every time an annotation to Barack_Obama
was found on a wikipedia text regardless of its anchor (i.e: its anchor could have been : B. Obama
or Barack O.
or Barry Obama
, President of the USA
).
Thank you for your help.
We are trying to use the DBpedia vectors available at https://github.com/idio/wiki2vec#prebuilt-models English Wikipedia (Feb 2015) 1000 dimension - No stemming - 10skipgram
If we want to see vectors of few multi-word entities (or at least the beginning of the vector) ? e.g. Barack_obama; White_house; Artificial_Intelligence; Computer_science; Natural_language_processing and so on ?
We try to open the en.model directly from Ubuntu and get error message of "Unknow file type", If we use "cat en.model" in the terminal we still get some messy code. Is there a way to open en.model and can let us see the DBpedia vectors?
yeah, those are gensim models. You have to use python and gensim to load them. check this gensim word2vec tutorial
We are trying to use the DBpedia vectors available at https://github.com/idio/wiki2vec#prebuilt-models English Wikipedia (Feb 2015) 1000 dimension - No stemming - 10skipgram
Would you mind letting us know whether the vectors include multi-word entities (e.g. Barack_Obama) or are about only "single words" ? Thanks.