scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
23 stars 25 forks source link

Extract Lexeme IDs in SPARQL Queries for Language Totals #110

Closed mhmohona closed 6 months ago

mhmohona commented 6 months ago

Contributor checklist


Description

The SELECT statement in the query has been updated to include (REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") as ?lexemeID) as the first element. This modification ensures that the query returns the lexeme ID alongside the word category and its counts, aligning with the requirements outlined in the issue discussion and the specific approach suggested by Andrew.

Related issue

github-actions[bot] commented 6 months ago

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. It'd be great to have you!

Maintainer checklist

andrewtavis commented 6 months ago

Thanks for this, @mhmohona! Will look to review it as soon as I can :) :)

andrewtavis commented 6 months ago

Hey @mhmohona 👋 Am realizing the directions here weren't quite what they should have been. The three queries you edited are actually the only three that don't need this change :) Specifically if you go into the extract_transform/languages directory and then find queries like extract_transform/languages/French/nouns/query_nouns.sparql, these are the files we want to add this line to 😊

Can you go through and remove the edits to the current files and send along versions of all instances of query_nouns.sparql, query_verbs.sparql and query_prepositions.sparql that have the lexemeID line included?

mhmohona commented 6 months ago

Ops! 🫠 Let me update it.

mhmohona commented 6 months ago

@andrewtavis this PR is up for review now!