Open airon90 opened 2 years ago
Yes it will help a lot too as you are doing query I guess that is enough a language picker that change something in the endpoint the game uses.
As all the game data is currently loaded from a single file at start, I think the best approach might be to provide language-specific versions of this file.
Approach 0: Instead of having a language-specific file, fetch the data of the Wikidata item each time a card is shown to see if Wikidata (at the moment) contains the desired translations. I'm not sure which endpoints can be accessed directly by the game in the browser, but e.g. these would seem to work: https://www.wikidata.org/wiki/Special:EntityData/Q42.json and https://query.wikidata.org/bigdata/ldf?subject=wd:Q42
Approach 1: For each card (Wikidata item) in the original data file, replace the original label, description and Wikipedia article title (in English) by ones in the desired language from the same Wikidata item. However, they might not be available or they might be unsuitable (contain the answer or have a mistake).
Approach 2: Generate a new set of cards appropriate in the desired language e.g. by tweaking https://github.com/tom-james-watson/wikitrivia-generator.
EDIT: Approach 3: Generate a new set of cards dynamically from frontend by calling a suitable Sparql endpoint such as QLever. https://qlever.cs.uni-freiburg.de/wikidata/
I like Approach 2 the most. Approaches 0 and 1 are for me:
I'll try Approach 2 in Romanian to see how it goes.
Edit: I take back liking Approach 2 after seeing the 73GB data source. I will still give it a try, but don't have high hopes.
@nicolaes :+1: Perhaps we can find the necessary people who can make this happen together. To make approach 2 easier, I found some initial discussion on reimplementing it based on queries against a Sparql endpoint. In my experience, the official Sparql endpoint does not have the performance needed, but QLever (and/or Virtuoso) might be able to answer all the queries we need. Here's a quick test that finds about 9000 results that might be suitable for Romanian cards: https://qlever.cs.uni-freiburg.de/wikidata/30kMrq?exec=true
See also: tom-james-watson/wikitrivia-generator#6 and tom-james-watson/wikitrivia-generator#8
@tuukka Thanks for the idea. I appreciate the effort to put together the Romanian version. The quick test of 9000 entries is very relevant; current English database has 10k entries.
I don't know SPARQL, so I am playing around the link you provided. My plan is to find a reasonably fast query that provides at least 5000 results, then put it together with the wikitrivia app.
I gave QLever a few tries, then I dropped it. I ran a query with all year types (created, discovered, invented, born etc) and I lost the backend connectivity. Probably because lack of optimization. Here is the code: https://qlever.cs.uni-freiburg.de/wikidata/aFFkcp
I got progres on the raw data source processing, and now have ~1000 usable entries for Romanian. I'm not yet sure if Approaches 0 and 1 are viable, but it might be worth trying them out. My steps to get the Romanian entities were:
wikibase-dump-filter
- 150k entries in 9h (should be faster for more popular languages)wikitrivia-generator
parser (translate filter words, change en
to ro
, adjust viewcounts) - 250 entries / hourSince I don't have many cards, I will account for the scenario when you don't have any relevant cards to show. Then I will put this live - see if Romanians actually use it.
@nicolaes I hadn't thought of the possibility to create a set of cards dynamically based on a Sparql query. I've added it as "Approach 3" in my original list. At a glance, an advantage would be that the data would update automatically, but a disadvantage would be that two games couldn't be guaranteed to be played with the same set of cards.
I have reported the QLever crash to its developers - I hope it's something they can easily fix as QLever is very performant in general.
Do you know why you got just 10% of the amount of cards compared to English? For example, is it because the Romanian labels are missing, the filter words match more often, or the viewcounts are lower?
Update: here's a query for QLever that returns all suitable Wikidata items and their required attributes sorted by sitelinks count (pageviews is not available for queries). You can change "en"
to any other language code: https://qlever.cs.uni-freiburg.de/wikidata/OycBUK
Some really interesting discussion here!
@nicolaes - yeah unfortunately the wikitrivia-generator process as it stands is slow. I think sparql is definitely the future. Also, with something like the example @tuukka has worked on, that shows how easy the SPARQL approach would make it to internationalize.
The discussion of how to work out the details of the SPARQL approach should be kept to https://github.com/tom-james-watson/wikitrivia-generator/issues/6.
@tuukka sorry for late reply, messed up notifications. I appreciated the time you invested in the SPARQL query. I got to download the 10k sample you prepared without any QLever issues.
About Romanian low count of entities: it's because not all pages are translated and I didn't adjust the view count thresholds correctly (e.g. I reduced it by 40x compared to English, while there are 60x less Romanian speakers).
PS: top hit from SPARQL query in Romanian is the wiki of Russia 🤔
Can you support other language? You can get correct labels from Wikidata