scribe-org / Scribe-Data

Wikidata and Wikipedia language data extraction
GNU General Public License v3.0
21 stars 24 forks source link

Edit and expand Arabic data processes #115

Closed andrewtavis closed 3 months ago

andrewtavis commented 5 months ago

Terms

Description

This issue would check and expand the queries and related data processes found in the scribe_data/extract_transform/languages/Arabic directory. It would be great to start with expanding the queries found in the nouns and verbs directory, and then from there we can discuss a formatting process :) The query_nouns.sparql and query_verbs.sparql files can be expanded on using the similar queries found for other languages. The formatting process should wait until the formatting process is expanded to focus on individual lexemes.

Contribution

Happy to support someone who has interest in working on this! 😊

andrewtavis commented 5 months ago

CC @mrbazzan who had interest in this! Please write in the issue and I'll assign! Also let me know if there are any questions :)

mrbazzan commented 5 months ago

I would like to start with this one.

andrewtavis commented 5 months ago

Fantastic, @mrbazzan! Let me know if there's anything we can do to help :)

mrbazzan commented 5 months ago

All I can think of right now is to copy some of the optional queries from German's query_nouns.sparql to Arabic's own but I don't really understand what that does or what's going on.

Is there anything else to be considered?

andrewtavis commented 5 months ago

Look into some of the Arabic lexemes and see if there are other properties or statements that could be added. There might not be though, so feel free to send along the basic files converted from German to Arabic!

mrbazzan commented 5 months ago

Look into some of the Arabic lexemes and see if there are other properties or statements that could be added.

What do you mean? I tried a couple of queries and they contain a lot of duplicate data.

andrewtavis commented 5 months ago

Duplicate how, @mrbazzan? So take the query below for Arabic verbs:

# All Arabic (Q13955) verbs.
# Enter this query at https://query.wikidata.org/.

SELECT
  ?lexeme
  (REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") as ?lexemeID)
  ?verb

WHERE {
  ?lexeme a ontolex:LexicalEntry ;
    dct:language wd:Q13955 ;
    wikibase:lexicalCategory wd:Q24905 ;
    wikibase:lemma ?verb .
}

From there you can check out a Lexeme like this one that also has statements for verb conjugations. Could you expand the query by referencing other verbs queries for other languages to then also get the conjugations for the verbs?

mrbazzan commented 5 months ago

Hello, @andrewtavis Sorry for the late reply (Holidays). I'll submit a draft PR just to show what I've been doing

andrewtavis commented 3 months ago

Closing this as the current state of the Arabic queries are in quite good share after #127 and other changes. Thanks for this, @mrbazzan!