sociepy / covid19-vaccination-subnational

🌍💉 Global COVID-19 vaccination data at the regional level.
https://sociepy.org/covid19-vaccination-subnational
GNU General Public License v3.0
61 stars 15 forks source link

same dates are appearing twice for some states in API for India #34

Closed sanyam-git closed 3 years ago

sanyam-git commented 3 years ago

Same dates are occurring twice for states such as IN-BR, IN-GJ and IN-KA in API for India. (not sure about other countries)

Example

{
            "region_iso": "IN-KA",
            "region_name": "Karnataka",
            "data": [
                {
                    "date": "2021-01-16",
                    "total_vaccinations": 13594,
                    "total_vaccinations_per_100": 0.02
                },
                {
                    "date": "2021-01-16",
                    "total_vaccinations": 13594,
                    "total_vaccinations_per_100": 0.02
                },
                {
                    "date": "2021-01-17",
                    "total_vaccinations": 29504,
                    "total_vaccinations_per_100": 0.05
                },
                {
                    "date": "2021-01-17",
                    "total_vaccinations": 29504,
                    "total_vaccinations_per_100": 0.05
                },
                {
                    "date": "2021-01-18",
                    "total_vaccinations": 66392,
                    "total_vaccinations_per_100": 0.11
                },
                {
                    "date": "2021-01-18",
                    "total_vaccinations": 66392,
                    "total_vaccinations_per_100": 0.11
                }
            ]
}
lucasrodes commented 3 years ago

Seems to be happening in the corresponding CSV file, too...

lucasrodes commented 3 years ago

I will check this asap, thanks for reporting this!

sanyam-git commented 3 years ago

Seems to be happening in the corresponding CSV file, too...

Didn't seem to happen here India.csv

lucasrodes commented 3 years ago

You are right, seems it appears when merging all country information into vaccinations.csv

lucasrodes commented 3 years ago

I think I got it, once all countries are merged into a single file (merge_countries.py, the script update_vaccinations_with_population.py is used in order to add population-related metrics.

This script retrieves population from file population.csv, which is generated when running

bash scripts/update_all.sh --update-population

This script retrieves population information from Wikidata and appears that there are two entries for IN-KA last date. I'll add a drop_duplicates in update_vaccinations_with_population.py.

sanyam-git commented 3 years ago

Nice work ! A really quick diagnosis. Yeah, it seems like there are two different population entries for some states on Wikidata. Maybe you can choose the greater one and drop the other (as the data is all from 2011 and the different does not seem to be significant)

lucasrodes commented 3 years ago

The population info is currently being generated with Wikidata, as I found it the easiest way to retrieve global population data from regions. However, I have noted that some regions' population are a bit outdated, so maybe this process could be re-thought.