scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
23 stars 25 forks source link

Option to grab the Wikidata lexemes for queried words #101

Closed wkyoshida closed 6 months ago

wkyoshida commented 6 months ago

Terms

Languages

ALL

Description

This issue is for the implementation of one of the ideas brought up in #59, in particular:

  • For nouns, verbs, and prepositions, this is likely the Wikidata lexemes.
andrewtavis commented 6 months ago

For someone with interest in working on this, this should be as simple as adding ?lexeme as the first element in the select statement for all .sparql files :)

oreotamish commented 6 months ago

@andrewtavis Can I work on this?

wkyoshida commented 6 months ago

Hey @oreotamish 👋 @mhmohona actually DM'ed earlier today expressing interest in this issue as well.. sorry about this, but is there another issue that you might interested in picking up?

oreotamish commented 6 months ago

Sure

wkyoshida commented 6 months ago

Thank you for understanding, @oreotamish! :pray::grin: I did later realize though that I did simply assume that your interest was due to our participation in GSoC. Apologies if that was an incorrect assumption! :pleading_face: If however GSoC is indeed how you found us, I do also realize that we're running low on available issues as well, given the amount of traffic that we're getting. I'll try to create some more issues soon. If you're interested, feel free to take a look at good first issues we have throughout the organization.

@mhmohona add a comment here when you can as well please, as we can only assign once someone's active in the issue

mhmohona commented 6 months ago

@wkyoshida, I would like to work on this issue.

andrewtavis commented 6 months ago

@mhmohona, just a quick note here that came from a discussion with @henrikth93: the way we'd like to get the lexeme IDs for this would be the following:

SELECT 
  (REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") as ?lexemeID)
  OTHER_COLUMS_LIKE_NOUN_OR_VERB_ETC

The ?lexemeID line gets us just the LID, not wd:LID as comes from just returning ?lexeme :)

andrewtavis commented 6 months ago

Adding this to all the current queries would close this issue 😊