mood-mapping-muppets / repo

COVID-19 European news text analysis dashboard for the Embeddia hackathon
http://covidmoodmap.rahtiapp.fi/
Apache License 2.0
0 stars 1 forks source link

Build Python constant based "database" of language <-> tld <-> country + overrides #5

Closed frankier closed 3 years ago

frankier commented 3 years ago

The idea is given a language and a domain we should be able to get the country.

e.g. ccTLD -> country e.g. gTLD -> have to try something else Language mostly only spoken in one country: language -> country Otherwise overrides, can be systematic using alexa top sites for European countries e.g. theguardian.com needs an override

Here are the existing Python module of constants style databases to build on:

https://github.com/flyingcircusio/pycountry https://pypi.org/project/pycountry-convert/

This lets us get top domains for a country code: https://github.com/alirrreza/Alexa-Top-Sites-By-Country

frankier commented 3 years ago

We also might want this together with list of different country names in different languages (available from Wikidata)

frankier commented 3 years ago

This is an example of querying Wikidata https://people.wikimedia.org/~bearloga/notes/wdqs-python.html