mledoze / countries

World countries in JSON, CSV, XML and Yaml. Any help is welcome!
https://mledoze.github.io/countries/
Open Data Commons Open Database License v1.0
5.93k stars 1.26k forks source link

What data to add next? #6

Closed mledoze closed 9 months ago

mledoze commented 10 years ago

I would like to discuss here the data that should be added to this repository.

A similar project like 0xJS [1] contains a lot more data such as the land area or the latitude/longitude coordinates of each country.

Is it interesting/useful to have this kind of data too?

Data that can be added:

What would you like to be added?

Please let me know in the comments.

[1] http://oxjs.org/#doc/Ox.COUNTRIES [2] source: http://opengeocode.org/ [3] source: https://oxjs.org/#doc/Ox.COUNTRIES


From the comments

scento commented 10 years ago

It might be useful to provide the country name in the native language of the country itself (e.g. {"name": "Germany", "name_native": "Deutschland"}...

scento commented 10 years ago

The CLDR database of the unicode project contains Country-To-Language data, including the percent of speakers: http://www.unicode.org/cldr/charts/latest/supplemental/territory_language_information.html

mledoze commented 10 years ago

It might be useful to provide the country name in the native language of the country itself

The native name of Germany is already in 'alt-spellings'. I recognize that the name 'alt-spellings' isn't good since it contains alternative spellings and the native name of the country. So there are two solutions here:

Initially, I created this dataset with a country selector in mind [1] but it would make more sense to be able to get the native names separately. So I would choose the second option.

But the second option raises the question of how to write the native name of the country. German uses latin characters so it's easy to know that it's Germany, but what about Armenia for example which is written Հայաստան in armenian [2]? For some people it might be difficult to know that it's Armenia.

What do you think?

I know that alternative spellings and native names are missing for many countries, I'm currently working on adding them. Also, I'll add the native/official language(s) of each country.

[1] https://github.com/JamieAppleseed/selectToAutocomplete [2] http://en.wikipedia.org/wiki/Armenia

scento commented 10 years ago

Not all people speak English, so they might be confused while selecting their locale. It might be useful if it is possible to see the English and native version of the country name parallel in the selector.

I would recommend to provide both versions for different individual usecases.

mledoze commented 10 years ago

Right, it's valid for non english speakers.

If you want, feel free to start working on adding the native names as I'll be off for a few days.

stephenpaulger commented 10 years ago

I think it would be great to have a way to make Countries Hierarchical and have meta data describing whether they are countries or sovereign states.

For the UK currently it says "alt-spellings":"GB,Great Britain,England,UK,Wales,Scotland,Northern Ireland".

The full name of the UK is "The United Kingdom of Great Britain and Northern Ireland". It is not a country, it is a sovereign state.

Great Britain also isn't a country, it's an island.

There are three countries in Great Britain: England, Scotland and Wales.

So the types I think needed are: Country, State, Sovereign State and potentially Nation and Union as well.

Then it would be good to have a way to specify that England is within the UK and if you also have unions that it is within the EU.

Another nice feature would be to list what land borders a country has. So you could specify that England borders Scotland and Wales for example.

fayderflorez commented 10 years ago

From https://github.com/ProGNOMmers

It would be wonderful if it would be possible to retrieve regions, provinces and cities.

Something like:

// Regions of country
// /rest/alpha2/it/regions ->
{ regions:  [ "Abruzzi e Molise",
              "Basilicata",
              "Calabria",
              "Campania",
              "Emilia-Romagna",
              "Friuli-Venezia Giulia",
              "Lazio",
              "Liguria",
              "Lombardia",
              "Marche",
              "Piemonte",
              "Puglia",
              "Sardegna",
              "Sicilia",
              "Toscana",
              "Trentino-Alto Adige",
              "Umbria",
              "Valle d'Aosta",
              "Veneto" ] }

// Provinces of region
// /rest/alpha2/it/regions/Veneto/provinces ->
{ provinces: [ "Verona", "Venezia", ... ] }

// Cities of province
// /rest/alpha2/it/regions/Veneto/provinces/Venezia/cities ->
{ cities: [ { name: "Venezia", zip_codes: [ "30121", ... , "30176" ] }, 
            { name: "Chioggia", zip_codes: [ "30015" ] },
            { name: "San Donà di Piave", zip_codes: [ "30027" ] }, 
            ... ] }

// Cities of country by name
// /rest/alpha2/it/regions/Veneto/provinces/Venezia/cities ->
{ cities: [ { name: "Venezia", zip_codes: [ "30121", ... , "30176" ] }, 
            { name: "Chioggia", zip_codes: [ "30015" ] },
            { name: "San Donà di Piave", zip_codes: [ "30027" ] }, 
            ... ] }

Cities could have metadata like f.i. zip codes, which are very useful.

It is a huge work because recording and maintaining the whole list of regions, provinces and cities for every world country is hard, but it is a good target to be accomplished by an open source project.

mledoze commented 10 years ago

@stephenpaulger

I think it would be great to have a way to make Countries Hierarchical and have meta data describing whether they are countries or sovereign states.

I agree, I'll add this to the todo. I know that many entries in the dataset are not actual contries. I wanted to provide simple and factual data about world countries but I understand that more accuracy is needed.

mledoze commented 10 years ago

@fayder

It would be wonderful if it would be possible to retrieve regions, provinces and cities.

Yes it is a huge work. First I want to continue to add more data at the country level (native and official names, official language, etc.) and add the master file as soon as possible (#12) to ease the contributions.

Thank you for your help/feedback, I appreciate it!

mledoze commented 10 years ago

For the UK currently it says "alt-spellings":"GB,Great Britain,England,UK,Wales,Scotland,Northern Ireland".

@stephenpaulger in bd22b4a97f30ead3ae55f68d2c3e9b86ba784ba7 I have removed most of the names in altSpellings, now it's just GB,UK,Great Britain.

mledoze commented 10 years ago

We can also add time zone data from http://timezonedb.com/download.

shanti2530 commented 10 years ago

It would be really nice if there would be also a list of states per country such as the United States states. http://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States

mledoze commented 10 years ago

@shanti2530 yes, this has been suggested https://github.com/mledoze/countries/issues/6#issuecomment-27620009 but it has not been done yet because the work is pretty huge. Do you know a source where we can find the states for every country?

shanti2530 commented 10 years ago

@mledoze don't know if this is what you were looking for http://vikku.info/programming/geodata/geonames-get-country-state-city-hierarchy.htm

mledoze commented 10 years ago

@shanti2530 this seems very good, thank you. I'll create an issue for this. Would you like to work on this?

gerbenjacobs commented 10 years ago

GeoJSON outlines of the countries: https://github.com/datasets/geo-boundaries-world-110m

mledoze commented 10 years ago

@gerbenjacobs yes good idea, I'll add this to the to-do

oriolfg commented 10 years ago

I agree for the gerbenjacobs idea of GeoJSON outlines of the countries

matiassingers commented 10 years ago

@mledoze don't know if it's in the scope of this project, but I would love to see financial information like GDP, GDP per capita, GNI etc. - problem with this is of course that these numbers would change every year.

mledoze commented 10 years ago

@matiassingers no it's not really in the scope of this project. I prefer to stick with static data that do not change. The dataset currently contains population data which are not in the scope and I would like to remove it in the near future.

Although it does not currently contains GDP data, you should check this project https://github.com/tinata/tinatapi which contains other financial data.

mledoze commented 10 years ago

@dalu the postal prefixes is a good idea!

mledoze commented 10 years ago

@dalu you are saying that postal services want the native country name instead of the country postal prefix?

mledoze commented 10 years ago

I would like to inform you that I am about to remove population data because they require frequent updates to stay relevant.

I recently added CONTRIBUTING explaining the contributions rules of this project. Population data do not follow these instructions.

fayderflorez commented 10 years ago

acknowledged

tdegrunt commented 10 years ago

How about the address format, from the page mentioned above: http://en.wikipedia.org/wiki/Address_(geography)

This may be fairly difficult to do as it requires some pseudo templating language, so say for US: "addressFormat": "{{name}}\n{{houseNumber}} {{street}}\n{{locality}}\n{{city}}\n{{postalCode}}"

And would need agreement on the labels used...

wires commented 10 years ago

Hi, nice project! Thanks.

Something that would be useful to me is to know if a country is in the European Union. (https://en.wikipedia.org/wiki/Member_state_of_the_European_Union)

This information is needed when you are a company in the EU dealing with international customers. If you charge VAT or not depends on whether your customer is in the EU or not.

If you are interested in including this information I could setup a pull-request

mledoze commented 10 years ago

@tdegrunt yes it is indeed a difficult task to do, but @hexorx managed to do it in his countries repository: https://github.com/hexorx/countries/blob/master/lib/data/countries.yaml

mledoze commented 10 years ago

@0x01 yes I'm interested in including this information. Could you please add it as extra data in the data folder using [cca3].json file names?

Thank you!

wires commented 10 years ago

@mledoze: I can do that (in data folder), but I think it makes more sense to put it into the main file. There is very little data added, basically a boolean whether or not it's an EU member state (and I'll leave out the field if it's false)

mledoze commented 10 years ago

@0x01 you are right that this represent little data but it would be useful for only 10% of the countries (26 member state of the EU out of 251 "countries" in the dataset).

Moreover, the EU is categorized as a supranational union and it exists many other unions in the world (see [1]), so as not to add many booleans in the main file, I prefer to add this data in separate files.

[1] http://en.wikipedia.org/wiki/Political_union#Supranational_and_continental_unions

wires commented 10 years ago

Hm, true. I redraw my offer for a pull request, as this EU membership is much more subtle indeed. Should I ever need to sort this out properly I'll come back with a pull request, but for now a simple list of country names is enough to get some rough indication. Which is good enough for my purposes. For example, this code in node does the trick

var rawCountries = require('countries.json');
var EU = [
    "Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus",
    "Czech Republic", "Denmark", "Estonia", "Finland", "France", "Germany",
    "Greece", "Hungary", "Ireland", "Italy", "Latvia", "Lithuania",
    "Luxembourg", "Malta", "Netherlands", "Poland", "Portugal",
    "Romania", "Slovakia", "Slovenia", "Spain", "Sweden", "United Kingdom"
];

var countries = rawCountries.map(function(country){
    return { // for example
        code: country.cca3,
        name: country.name
        eu_member: _(EU).contains(country.name)
    }
});
pelegm commented 9 years ago

I'd add coastline length (CIA factbook field 2060).

yackermann commented 9 years ago

How about population??

mledoze commented 9 years ago

@herrniemand population data was already added to this dataset (https://github.com/mledoze/countries/commit/81fa9f68215d92fba2a850c272d019f539cf30ad) but later removed (see https://github.com/mledoze/countries/issues/6#issuecomment-42322804).

yackermann commented 9 years ago

I've seen you were discussing official names, and I got an idea:

{
        "name": {
            "common": "Afghanistan",
            "native": "\u0627\u0641\u063a\u0627\u0646\u0633\u062a\u0627\u0646",
            "official": "Islamic Republic of Afghanistan"
        }
}

otherwise there to many "name%insert type here%"

mledoze commented 9 years ago

@herrniemand this is a very good idea. I also want to add the official name in its native language, so we could have something like this:

{
        "name": {
            "common": "Afghanistan",
            "official": "Islamic Republic of Afghanistan",
            "native": {
                "common" : "\u0627\u0641\u063a\u0627\u0646\u0633\u062a\u0627\u0646",
                "official": "\u062f \u0627\u0641\u063a\u0627\u0646\u0633\u062a\u0627\u0646 \u0627\u0633\u0644\u0627\u0645\u064a \u062c\u0645\u0647\u0648\u0631\u06cc\u062a"
            }
        }
}

What do you think?

yackermann commented 9 years ago

@mledoze Yea. Awesome.

yackermann commented 9 years ago

@mledoze how about translations? I was thinking something like:

{
        "name": {
            "common": "Afghanistan",
            "official": "Islamic Republic of Afghanistan",
            "native": {
                "common" : "\u0627\u0641\u063a\u0627\u0646\u0633\u062a\u0627\u0646",
                "official": "\u062f \u0627\u0641\u063a\u0627\u0646\u0633\u062a\u0627\u0646 \u0627\u0633\u0644\u0627\u0645\u064a \u062c\u0645\u0647\u0648\u0631\u06cc\u062a"
            },
            "translations":{
                "ru":...,
                "de":...
            }
        }
}

or should we keep as it is?

mledoze commented 9 years ago

@herrniemand I prefer to keep the translations as it is for now.

yackermann commented 9 years ago

@mledoze ok.

yackermann commented 9 years ago

@mledoze. I've just stuck on a problem with native names. Some countries like Afghanistan and Åland Islands have more then one official language, so what do we define as native?

mledoze commented 9 years ago

@herrniemand for countries with more than one official language, you should use the language that is listed first in the language property. So for Afghanistan and Åland Islands, it is Pashto and Swedish respectively.

yackermann commented 9 years ago

@mledoze ok. Thanks.

yackermann commented 9 years ago

@mledoze Question about translations. What is the list of the languages we want country names would be translated? I suggest UN(7) official + first 15 from most speaking: https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers

yackermann commented 9 years ago

@mledoze Another thing. Since we are changed structure of the language block, wouldn't it be correct to change translations to the same style:

"translations":{
    "de": {
        "common":"Russland",
        "official":"Russische Föderation"
    }...
}

?

ReSpawN commented 9 years ago

Not exactly contributing to this issue, the feature list or the roadmap, but I simply wanted to express my deepest thanks for putting this awesome list up on the web. I've been searching for it for over 2 years, and a simple though epicly effective Google search turned this page up.

Thank you for your amazing for and this amazing list of countries!

mledoze commented 9 years ago

@ReSpawN you're very welcome, thank you for your comment, I really appreciate it! This work would not be as it is now without the help of all the contributors.

If you do something with this dataset, don't hesitate to add it the the showcase list in the readme.

romsson commented 9 years ago

A variable saying if the country is a landlocked country or not http://en.wikipedia.org/wiki/Landlocked_country

pelegm commented 9 years ago

+1 for @romsson's idea.

yackermann commented 9 years ago

+1 @romsson's idea *)