predict-idlab / pyRDF2Vec

🐍 Python Implementation and Extension of RDF2Vec
https://pyrdf2vec.readthedocs.io/en/latest/
MIT License
246 stars 52 forks source link

TypeError: init() got an unexpected keyword argument 'label_predicates' #42

Closed nnadine25 closed 3 years ago

nnadine25 commented 3 years ago

hi, i create a kg object from dbpedia sparql endpoint

label_predicates = [ 'http://www.w3.org/2000/01/rdf-schema#comment', 'http://www.w3.org/2000/01/rdf-schema#label', 'http://www.w3.org/2000/01/rdf-schema#seeAlso', 'http://www.w3.org/2002/07/owl#sameAs', 'http://www.w3.org/2003/01/geo/wgs84_pos#geometry', 'http://dbpedia.org/ontology/wikiPageRedirects', 'http://www.w3.org/2003/01/geo/wgs84_pos#lat', 'http://www.w3.org/2003/01/geo/wgs84_pos#long', 'http://www.w3.org/2004/02/skos/core#exactMatch', 'http://www.w3.org/ns/prov#wasDerivedFrom', 'http://xmlns.com/foaf/0.1/depiction', 'http://xmlns.com/foaf/0.1/homepage', 'http://xmlns.com/foaf/0.1/isPrimaryTopicOf', 'http://xmlns.com/foaf/0.1/name', 'http://dbpedia.org/property/website', 'http://dbpedia.org/property/west', 'http://dbpedia.org/property/wordnet_type', 'http://www.w3.org/2002/07/owl#differentFrom', ]

kg = KG("https://dbpedia.org/sparql", is_remote=True, label_predicates=[rdflib.URIRef(x) for x in label_predicates])

i get this error

kg = KG("https://dbpedia.org/sparql", is_remote=True, TypeError: init() got an unexpected keyword argument 'label_predicates'

rememberYou commented 3 years ago

The error is explicit. It tells you that the keyword argument label_predicates no longer exists.

Since pyRDF2Vec 0.2.0, label_predicates has been renamed to skip_predicates and no longer takes a list, but a set of predicates to exclude.

This is how your code should look like:

from pyrdf2vec import KG

skip_predicates = {
    "http://dbpedia.org/ontology/wikiPageRedirects",
    "http://dbpedia.org/property/website",
    "http://dbpedia.org/property/west",
    "http://dbpedia.org/property/wordnet_type",
    "http://www.w3.org/2000/01/rdf-schema#comment",
    "http://www.w3.org/2000/01/rdf-schema#label",
    "http://www.w3.org/2000/01/rdf-schema#seeAlso",
    "http://www.w3.org/2002/07/owl#differentFrom"
    "http://www.w3.org/2002/07/owl#sameAs",
    "http://www.w3.org/2003/01/geo/wgs84_pos#geometry",
    "http://www.w3.org/2003/01/geo/wgs84_pos#lat",
    "http://www.w3.org/2003/01/geo/wgs84_pos#long",
    "http://www.w3.org/2004/02/skos/core#exactMatch",
    "http://www.w3.org/ns/prov#wasDerivedFrom",
    "http://xmlns.com/foaf/0.1/depiction",
    "http://xmlns.com/foaf/0.1/homepage",
    "http://xmlns.com/foaf/0.1/isPrimaryTopicOf",
    "http://xmlns.com/foaf/0.1/name",
}

kg = KG(
    "https://dbpedia.org/sparql",
    skip_predicates=skip_predicates,
    is_remote=True,
)

NOTE: you no longer need to explicitly cast each predicate into a URI with RDFLib.

Take the time to read the documentation and examples provided.

nnadine25 commented 3 years ago

i find the same problem when i upgrade gensim for word2vec

if name == 'main': transformer = RDF2VecTransformer(Word2Vec(workers=1, size=200), [RandomWalker(1, 200,random_state=42)]) embeddings = transformer.fit_transform(KG(location="http://dbpedia.org/sparql", is_remote=True), ["http://dbpedia.org/resource/Brussels"]) print(embeddings)

self._model = W2V(**self.kwargs) TypeError: init() got an unexpected keyword argument 'size'

nnadine25 commented 3 years ago

model = Word2Vec(sentences=common_texts, window=5, min_count=1, workers=1) and in pyrdf2vec there is Word2Vec(iter=10), i can't understand the difference

rememberYou commented 3 years ago

gensim 4.0.0 renamed the size hyperparameter to vector_size and the Word2Vec class of pyRDF2Vec uses { size=500, min_count=0, negative=20 } as default keyword argument dictionary. As their release points out, some APIs can be impacted, which is the case for pyRDF2Vec.

No matter which hyperparameters you send to the Word2Vec class, the size hyperparameter will be injected to the Word2Vec class of gensim. Unfortunately, gensim decided not to keep the size hyperparameter until the next releases, which would have avoided this situation.

In this scenario, I recommend two things:

  1. Use a lower version of gensim 4.0.0 while a minor release of pyRDF2Vec includes the fix of this bug.
  2. Use a set and not a list as data structure to skip your predicates.

NOTE: if you would like to use gensim 4.0.0, nothing prevents you from reimplementing the Word2Vec class of pyRDF2Vec by removing this default dictionary.

nnadine25 commented 3 years ago

thank yo so much