hugolpz / Sparql2Data

Given SPARQL queries, save responses into corresponding persistent files, daily. Served via github page, accessible quickly via xhr.
https://hugolpz.github.io/Sparql2Data/
MIT License
0 stars 0 forks source link

Fix `LL-LanguagesGenderData.json` to include languages with one gender being empty #17

Closed hugolpz closed 1 year ago

hugolpz commented 1 year ago

Hello @felixknds ,

Do you think you could look at this issue some days before Wikimania ?

UI

I do a speakersFemales+speakersOthers in the UI for now, so the 1 in Gender split actually means 0 (females) + 1 (others)

image Screenshot_2023-06-03-14-57-39-65_40deb401b9ffe8e1df2f1cc5ba480b12

Faulty sparql

I suspect this SPARQL excludes any language where either males or females equal 0, so only 100 languages are returned.

LLQS

hugolpz commented 1 year ago

@felixknds Still some weirdities.

Languages below have known male leading contributors yet appears as recordsOthers, wrongly : image

See also

Given Languages names, return LLQid

SELECT (SUBSTR(STR(?languageId),32) AS ?languageQid) ?languageName
WHERE {
  VALUES ?languageName { "Afrikaans" "Breton" "Esperanto" "Gascon" "Occitan" "Salentino" "Sicilian" } # Target values
  ?languageId 
    prop:P2 entity:Q4 ;          # Filter: P2 'instance of' is Q4 'language' AND
    rdfs:label ?languageLabel .  # Assign value label into ?languageLabel
  BIND ( STRLANG(?languageName, "en") AS ?languageLabel ) # Bind filter by English
}

image

Given languages, return speakers with genders

SELECT (SUBSTR(STR(?language),32) AS ?languageQid) ?languageLabel (SUBSTR(STR(?speaker),32) AS ?speakerQid) ?speakerLabel ?genderLabel
WHERE {
  VALUES ?language { entity:Q209 }
  ?speaker prop:P2 entity:Q3 .  # P2 'instance of' is Q3 'speaker'
  ?speaker prop:P4 ?language .  # P4 'language' is Q209 'Esperanto'
  ?speaker prop:P8 ?gender .
  # Labels
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}

Given speaker name, return Qid

SELECT ?speakerName (SUBSTR(STR(?speakerId),32) AS ?speaker)  ?genderLabel
WHERE {
  VALUES ?speakerName { "Lepticed7" "XANA000" "ThonyVezbe" "Davidgrosclaude" } # Assign value: one or multiple values
  # note: need to comment BIND
  BIND ( STRLANG(?speakerName, "en") AS ?speakerLabel )
  # Grammatical note: ';' allows to chain actions 
  ?speakerId prop:P2 entity:Q3 ;        # Filter: P2 'instance of' is Q3 'speaker'.
             rdfs:label ?speakerLabel . # Filter by value: label equal ?speakerLabel's value
  ?speakerId prop:P8 ?gender;
             rdfs:label ?speakerLabel .
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

image

Given language Qid and User Qid, gives number of recordings in that language

SELECT (SUBSTR(STR(?language),32) AS ?languageId) ?languageLabel (SUBSTR(STR(?speaker),32) AS ?speakerId)  ?speakerLabel (COUNT(?audio) AS ?audios)
WHERE {
  VALUES ?language { entity:Q25 entity:Q209 entity:Q259 entity:Q311 entity:Q930 }  # Assign value: Q25 'Esperanto' Q259 'Sicilian' into ?language  
  VALUES ?speaker { entity:Q51319 entity:Q584098 entity:Q687891 entity:Q1976 }   # Assign value: Q445757 'SangeetaRH‎' into ?speaker 
  ?audio prop:P5 ?speaker .   # Filter: P5 'speaker' is Q445757 'SangeetaRH‎'
  ?audio prop:P4 ?language .  # Filter: P4 'language' is Q34 'Marathi'
  ?audio prop:P2 entity:Q2 .  # Filter: P2 'instance of' is Q2 'record'
  # Add labels
  SERVICE wikibase:label {bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en"} 
}
GROUP BY ?language ?languageLabel ?speaker ?speakerLabel  # Sorting first groups per language and speaker

image

Conclusion

Yet these languages show no speakersMale in LL-LanguagesRecordingsGender.json nor in the language gallery. These male audios appears as recordingsOthers

hugolpz commented 1 year ago

After observation... It seems that somehow, whenever recordsFemale is null, all recordsMale are counted as recordsOthers.

felixknds commented 1 year ago

So, it seems as though OPTIONAL { INCLUDE %females } does not quite do what I thought it would do, so I moved the OPTIONAL clause into the subqueries instead of the outer query.

languageLabel wikidata iso records recordsMale recordsFemale recordsOthers
Breton Q12107 bre 1391 1350 0 41
Esperanto Q143 epo 33842 29917 0 3925
Gascon Q35735 gsc 4887 4887 0 0

Check out 5d4b641

hugolpz commented 1 year ago

It's working ! Thank you so much @felixknds 🌹🌻🌼😍

Screenshot_2023-06-04-14-09-04-78_40deb401b9ffe8e1df2f1cc5ba480b12

Screenshot_2023-06-04-14-09-47-97_40deb401b9ffe8e1df2f1cc5ba480b12 Screenshot_2023-06-04-14-12-15-93_40deb401b9ffe8e1df2f1cc5ba480b12

https://hugolpz.github.io/LanguagesGallery/