scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
30 stars 69 forks source link

Data process for Greek nouns #58

Closed andrewtavis closed 8 months ago

andrewtavis commented 12 months ago

Terms

Languages

Greek

Description

This issue would create the base noun and verb SPARQL queries for Scribe-Data so that we can then add them in to Scribe-iOS :) These would be added to the extract transform directory.

NikosLikomitros commented 12 months ago

Joined from the Greek community.

andrewtavis commented 12 months ago

Great to have you, @NikosLikomitros! As discussed in the call, I'll get the base SPARQL queries up and it'd be great if you can check them :)

NikosLikomitros commented 12 months ago

Yes, I can help with this. If this helps my language to be further present on the digital editing systems, I can do

wkyoshida commented 9 months ago

Breaking this issue into separate issues for the data processes for the Greek nouns and verbs.

This will become the issue for the SPARQL query for Greek nouns.

The WIKIDATAGUIDE.md has some good info on Wikidata and SPARQL :blush:

As reference, this is the query used for the Italian nouns .../Italian/nouns/query_nouns.sparql

Jk40git commented 8 months ago

Breaking this issue into separate issues for the data processes for the Greek nouns and verbs.

This will become the issue for the SPARQL query for Greek nouns.

The WIKIDATAGUIDE.md has some good info on Wikidata and SPARQL 😊

As reference, this is the query used for the Italian nouns .../Italian/nouns/query_nouns.sparql

@wkyoshida the link for the reference is broken please

wkyoshida commented 8 months ago

Hey @Jk40git :wave: There was a recent change to put all the language processing files under a parent languages/ directory, so that file's location just moved a bit to here:

Jk40git commented 8 months ago

Hey @Jk40git 👋 There was a recent change to put all the language processing files under a parent languages/ directory, so that file's location just moved a bit to here:

I am not getting any result with the greek nouns no matching record found

wkyoshida commented 8 months ago

Hey @Jk40git! There definitely are Greek nouns on Wikidata, so it could just be something with the query. Would you like to share what you tried in a gist and we could take a look perhaps?

I would also make sure to go over the WIKIDATAGUIDE.md if you haven't yet. Wikidata queries can be tough to work with :sweat_smile:

Jk40git commented 8 months ago

@wkyoshida here is the gist

andrewtavis commented 8 months ago

Hey there @Jk40git :) Makes sense that this was a bit tough. Specifically you were using Q9129 in the query for Greek, but then there are multiple kinds of Greek languages including Q36510 (Modern Greek) and Q35497 (Ancient Greek).

Here's the correct way of going about this using Q36510:

SELECT DISTINCT ?singular ?plural ?gender WHERE {

  # Nouns and pronouns.
  VALUES ?nounTypes { wd:Q1084 wd:Q147276 }
  ?lexeme a ontolex:LexicalEntry ;
    dct:language wd:Q36510;
    wikibase:lexicalCategory ?noun .
  FILTER(?noun = ?nounTypes)

  # Optional selection of singular forms.
  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?singularForm .
    ?singularForm ontolex:representation ?singular ;
      wikibase:grammaticalFeature wd:Q131105 ;
      wikibase:grammaticalFeature wd:Q110786 ;
  } .

  # Optional selection of plural forms.
  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?pluralForm .
    ?pluralForm ontolex:representation ?plural ;
      wikibase:grammaticalFeature wd:Q131105 ;
      wikibase:grammaticalFeature wd:Q146786 ;
  } .

  # Optional selection of genders.
  OPTIONAL {
    ?lexeme wdt:P5185 ?nounGender .
    FILTER NOT EXISTS { ?lexeme wdt:P31 wd:Q48277}
  } .

  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE]".
    ?nounGender rdfs:label ?gender .
  }
}
Jk40git commented 8 months ago

Hey there @Jk40git :) Makes sense that this was a bit tough. Specifically you were using Q9129 in the query for Greek, but then there are multiple kinds of Greek languages including Q36510 (Modern Greek) and Q35497 (Ancient Greek).

Here's the correct way of going about this using Q36510:

SELECT DISTINCT ?singular ?plural ?gender WHERE {

  # Nouns and pronouns.
  VALUES ?nounTypes { wd:Q1084 wd:Q147276 }
  ?lexeme a ontolex:LexicalEntry ;
    dct:language wd:Q36510;
    wikibase:lexicalCategory ?noun .
  FILTER(?noun = ?nounTypes)

  # Optional selection of singular forms.
  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?singularForm .
    ?singularForm ontolex:representation ?singular ;
      wikibase:grammaticalFeature wd:Q131105 ;
      wikibase:grammaticalFeature wd:Q110786 ;
  } .

  # Optional selection of plural forms.
  OPTIONAL {
    ?lexeme ontolex:lexicalForm ?pluralForm .
    ?pluralForm ontolex:representation ?plural ;
      wikibase:grammaticalFeature wd:Q131105 ;
      wikibase:grammaticalFeature wd:Q146786 ;
  } .

  # Optional selection of genders.
  OPTIONAL {
    ?lexeme wdt:P5185 ?nounGender .
    FILTER NOT EXISTS { ?lexeme wdt:P31 wd:Q48277}
  } .

  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE]".
    ?nounGender rdfs:label ?gender .
  }
}

okay thanks for the info. I appreciate!

andrewtavis commented 8 months ago

Closed via #93 😊 Thanks for the help here, @Jk40git!