sociepy / covid19-vaccination-subnational

πŸŒπŸ’‰ Global COVID-19 vaccination data at the regional level.
https://sociepy.org/covid19-vaccination-subnational
GNU General Public License v3.0
61 stars 15 forks source link

Adding total_vaccinations and population field at a national level #27

Open sanyam-git opened 3 years ago

sanyam-git commented 3 years ago

Currently the country-wise latest and all API have the following structure :

{
    "country": "India",
    "country_iso": "IN",
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}

Two more fields can be added : total_vaccinations and population as such:

The updates structure as :

{
    "country": "India",
    "country_iso": "IN",
    "total_vaccinations":7017114,
    "population":1371360350,
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}
lucasrodes commented 3 years ago

Hi @sanyam-git, Thanks for your proposal! It could be a nice-to-have feature.

The reason for not adding these fields so far was because the https://github.com/owid/covid-19-data project already does. But still, we could give it a try so we can have this info all in one API.

Your points regarding how-to obtain the aggregated national values are quite relevant, as simply iterating over the available regional JSON files would not work. Some countries add "Misc", "Others" fields, which are removed in the process of generating the API.

Data update process

To give you an overview, the data update is performed with the script update_all, which sequentially executes the following steps:

  1. Update country regional data. For each country do: 1.1. Scrape each country's source link and get the raw data. 1.2. Process the raw data (change column names, standardize region names & ISO codes, etc.) 1.3. Export the processed data as a CSV file to data/countries directory.
  2. Merge all country generated CSV files into a single vaccinations.csv file.
  3. Add population-related metrics to vaccinations.csv file (e.g. total_vaccinations_per_100, etc.).
  4. Generate API files using each country's CSV file
  5. Update documentation with changes (e.g. update README.md)

Note that in step 1.2 all special regions like "Misc", "Others" are discarded. Hence, recovering these at step 4. would be quite complex at the moment.

Some ideas:

API proposals

Proposal 1 (yours)

{
    "country": "India",
    "country_iso": "IN",
    "total_vaccinations":7017114,
    "population":1371360350,
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}

Proposal 2

Having total_vaccinations_per_100 instead population.

{
    "country": "India",
    "country_iso": "IN",
    "total_vaccinations":7017114,
    "total_vaccinations_per_100":0.5117,
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}

Proposal 3

{
    "country": "India",
    "country_iso": "IN",
    "total_vaccinations":7017114,
    "total_vaccinations_per_100":0.5117,
    "population":1371360350,
    "last_update": "2021-02-10",
    "source_url": "https://india-covid19vaccine.github.io",
    "data": [
    ]
}

I would probably go for proposal 2 and leave the population field out. My reasoning is that:

Please let me know what you think! πŸ˜„

lucasrodes commented 3 years ago

I'll be adding total_vaccinations_per_100 to individual region JSON files.

sanyam-git commented 3 years ago

@lucasrodes, Thanks for giving a detailed info of the inner working of project. Here's my take :

I think that adding the total_vaccinations and total_vaccinations_per_100 at national level also will be quite helpful, what do you think ? (as mentioned above by you the data is available at owid, but it will be better if one can get it all at one place)

Keep the good work :) :+1:

lucasrodes commented 3 years ago

Hi @sanyam-git, Yes, just added per 100-capita metrics recently to region files. I Will think about how to add such info at the national level, shouldn't be difficult. I Will get back to this thread once I get to something more concrete.

Thanks for your contribution πŸ˜„ !