stanford-oval / wikidata-emnlp23

WikiSP, a semantic parser for Wikidata. WikiWebQuestions, a SPARQL-annotated dataset on Wikidata
https://arxiv.org/abs/2305.14202
74 stars 8 forks source link

Domain names dataset #2

Closed screemix closed 3 months ago

screemix commented 3 months ago

First of all, thank you for your work! Really good job.

However, I have a question. In your paper you mentioned that you used 'instance of' properties.

Unlike relational databases and Freebase, Wikidata has no predefined domains or types. Any entity can have an arbitrary set of properties. However, even though Wikidata is property-based, all named entities have one or more instance of properties to some domain entity; domain entities are organized into a hierarchy with the subclass of property.

Could you please share the dataset that divides entities into domains, or at least provide details on obtaining the so-called domains for entities?

george1459 commented 3 months ago

Hi, thank you for reaching out!

The domain entities exist in Wikidata and named entities are attached to them. We didn't create a dataset to divide entities into domains ourselves.

For example: Barack Obama is an "instance of" human, the first relation it has.

You can also reference this page and checkout sections about instance of (P31) and subclass of (P279) for more information.

During our training process, we convert all domain entities into their names, e.g. in this example, the domain entity wd:written_work (https://www.wikidata.org/wiki/Q47461344) has been converted from QID to its name.

Let us know if you have further questions!

george1459 commented 3 months ago

Hey, I will be closing the issue for now. Feel free to re-open it if you have follow-up questions.