Closed samayo closed 4 months ago
I think that list is a valid requirement for some people, but not everyone.
I'd add that as a new list within this repo.
Just add another file in src called "recognized-un-country.json" with a 1 or 0 value. This will keep the existing structure and pushed the responsibility to the person(s) creating their application. Hope this helps.
Just add another file in src called "recognized-un-country.json" with a 1 or 0 value. This will keep the existing structure and pushed the responsibility to the person(s) creating their application. Hope this helps.
Thanks, but I don't think it would be nice to keep the existing structure. Some src files have more entries than others. The idea is for all files to contain all 193 countries in the same order, so if you want to get multiple data of one country from all files, it would be very convenient.
Does this data currently scrapped from wikipedia? if yes is it automated?
Yes it's scrapped of Wikipedia mostly. Automating the process has been the goal for a long but I can't find much time that's why it's not implemented
I can implement the automation but I still don't understand the wikipedia data.
https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population
In this wikipedia I don't understand what is the difference between numbered country and -
country?
Note: A numbered rank is assigned to the 193 member states of the United Nations, plus the two observer states to the United Nations General Assembly. Dependent territories and constituent countries that are parts of sovereign states are not assigned a numbered rank
So numbered are officially recognised countries in numbered are disputed like Taiwan for example.
This repo should focus only on recognised countries
@samayo should this repo include the two observer states?
Yes I think that would be ok
null
value. Like if the country doesn't have the data instead of null
we delete the data to reduce size. API library can solve this to return null
instead of error when getting the data.wikipedia
and wikimedia
. I still can't find some data.1
or 0
{ 3LetterCountryCode: data }
instead of [{ country: name, data: data }]
to reduce size.GS1Code
Country By Abbreviation
to ISO3166
. Add 2 and 3 letter code.Currency code
to ISO4217
.Domain tld
to ccTLD
and add new fields.elevation
to averageElevarion
maleToFemaleRatio
, methodoly
, year
?UPC-A compatible - Used to issue restricted circulation numbers within a geographic
.United States Virgin Islands
.ISO4217
: https://en.wikipedia.org/wiki/Currencyregistry
, IDN
, DNSSEC
, SLD
, IPv6
.month
, day
and from
fields?officialLanguages
, regionalLanguage
, minorityLanguage
, nationalLanguage
,widelySpoken
fields?World Bank Group (2022)
data. male
, female
, all
fields.Pew forum
data. Can we add percentage
per country per religion?isOfficial
boolean.Great point thanks for all the help so far you are making this easy even if I want to implement it.
Some notes:
Maybe we can use object like { 3LetterCountryCode: data } instead of [{ country: name, data: data }] to reduce size.
Lets leave the above as is for now because I don't see a reason to change that
I'm ok with removing alphabet letters but not country names, remember that there are many websites, games that need to display just country names for some reason
Other than those remarks everything else is a great idea
Btw I was recently thinking to give chatgpt the Wikipedia section that contains the data and ask it to generate the python code to convert the data from html to JSON and use that script every month to look for more updates.
The script would be made using python with scrappy I have an unfinished version of it in my local.
Once chatgpt creates the script and it works we upload the script to a server and with cronjob run it every month to scrap and send a pr request
That's what I thought initially, feel free to work upon the idea of provide your own
I have never used python
since 2020 so I can't implement it in python
.
Now I mostly use typescript
with NodeJS
.
Instead of vps we can use github action instead.
For Country name
can we just use array? Like [ "A", "B" ]
.
Can we move this repo to new organization so I can add sdk If I can.
Where did you find the flag svg @samayo?
Wherever they are, they 100% need to go through SVGO or a similar compressor.
I can't find the svg source to scrap. The image html always unstructured.
It's from Wikipedia. Check each country's flag page, it will have SVG format
@samayo https://en.wikipedia.org/wiki/File:Flag_of_the_United_States_(DoS_ECA_Color_Standard).svg This still need to go through svgo?
I don't understand your question. You will find a .svg file on every Wikipedia page and that must be converted to base64 format. We store in this repo a base64 representation of the svg
isn't svgo is for optimizing svg?
You still need to right click on the flag and select "open image in new tab..." Then you will see this URL
That is the SVG the one you linked is html page
What is svgo for?
That one is actually okay, but some flags are massive files. Try https://jakearchibald.github.io/svgomg/ on the more complex flags.
SVGOMG is just a GUI for SVGO.
Ok so wikipedia -> svgo optimize -> base64?
@samayo Can I make the scrapper with typescript
instead of python
?
I highly suggest python so I can contribute also but you decide. Where do you plan to host the script? Here or at your own GitHub page?
@samayo If you want to use python
I can't contribute.
I can create new repo so I can use typescript instead. If you don't want @samayo.
I think I will give it a shot and you can also go ahead and try we can use one or the other or both. I am happy to get a regular pr from anyone
Ok i will create a pr later with typescript
@samayo Where to get Geo Coordinates?
@samayo should the data include the country even though the data is null
Like [{ country: 'x', data: null }]
do we need to include this?
@samayo any update for my previous question?
@samayo Where to get Geo Coordinates?
all from wikipedia
@samayo should the data include the country even though the data is null
Like
[{ country: 'x', data: null }]
do we need to include this?
yes we should definitely add the country, we use ca use null, none, false or 0 You can pick any format as long as it is consistent. I prefer null since 0 could confuse users with other data
@samayo Where to get Geo Coordinates?
all from wikipedia
Can you add the link or the wikipedia page? I can't find it.
It seems I was wrong, it is not from wikipedia and the way the data is represented is not entirely optimal.
Can you use this instead? https://developers.google.com/public-data/docs/canonical/countries_csv
You can use other source. In any case, this data is unlikely to change so you can even exclude it
It seems I was wrong, it is not from wikipedia and the way the data is represented is not entirely optimal.
Can you use this instead? https://developers.google.com/public-data/docs/canonical/countries_csv
You can use other source. In any case, this data is unlikely to change so you can even exclude it
That data is different with this https://github.com/samayo/country-json/blob/0c522ea1e7ae88e9a2dd979322fbf8c2814b0de6/src/country-by-geo-coordinates.json
The data in google doesn't have west
, south
, north
, east
It's fine, we can use whatever is close enough and if there is a need to improve it then we can do that later
@samayo I have some problem that the country name in different wikipedia page have different names
For example
Netherlands
Netherlands, Kingdom of the
We can compare the url but I don't know will it still be different. But I'll try.
I don't know about the redirect issue, but if you found the solution then it's good. About country names being different on different pages, i think we have to use a custom code logic for that. e.g., if(countryName = "Netherlands, Kingdom of the") {CountryName = "Netherland")
I don't know about the redirect issue, but if you found the solution then it's good. About country names being different on different pages, i think we have to use a custom code logic for that. e.g., if(countryName = "Netherlands, Kingdom of the") {CountryName = "Netherland")
If we use if like that the automation will be broken when the wikipedia page is updated and there is so many alias
that's unlikely, i don't know what you are planning to use but using python and panda, to find the table you are looking for very easily.
Take a look at this https://medium.com/analytics-vidhya/web-scraping-a-wikipedia-table-into-a-dataframe-c52617e1f451
from step 5, it is very easy to get all tables in the page and target the table you need.
So it's unlikely any changes will break as far as I think.
I can resolve the redirect issue using this https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bredirects api.
@samayo In this wikipedia page https://en.wikipedia.org/wiki/List_of_country_calling_codes
Some countries have 2 or more codes
Which code should we include?
In the old json its just concat the 1
with 939
ignoring 787
{
"country": "Puerto Rico",
"calling_code": 1939
},
@kennarddh We have to use both separated by a comma, if you have better ideas let me know Thanks
@kennarddh We have to use both separated by a comma, if you have better ideas let me know Thanks
We can use something like this
{
"country": "example",
"data": [1787, 1938]
}
@kennarddh looks good for me
@samayo russia have like range code?
{
"country": "russia",
"data": [71, 72, 73, 74, 75, 78, 79]
}
Is this right?
Hello, this is to discuss about new major change to the repo.
I am trying to remove most countries not recognized by UN. Currently, there are 248 countries in this repo, but the UN recognizes only 193 of them, so this will be a big change.
Other than that, I will fill all data for each country (so, no
null
or empty values)All data will be also automated (to be updated each week whenever something changes in the source like wikipedia)
Let me know if you like to keep this repo as per the UN recognized countries only