sociepy / covid19-vaccination-subnational

🌍💉 Global COVID-19 vaccination data at the regional level.
https://sociepy.org/covid19-vaccination-subnational
GNU General Public License v3.0
61 stars 15 forks source link

Cumulative quantities not really cumulative ? #24

Closed hef-abd closed 3 years ago

hef-abd commented 3 years ago

Hi - Taking Canada as an example in the shot below: the timeseries is not continuously increasing as one would expect from a cumulative timeseries. It does not correspond neither to a daily (or "new") number as the latest figures are pretty close to the total vaccinations (just below 1 million people).

I observed this for a lot of countries: USA, Italy, Czechia, ...

image

lucasrodes commented 3 years ago

Hi @hef-abd, thanks for reporting.

I have recently changed the API and might have introduced a bug or two. I will check this ASAP today, starting with Canada, the USA, Italy, and Czechia.

Thanks for reporting and I'll get back to you, so if you could check the numbers by then

lucasrodes commented 3 years ago

Hi @hef-abd , I have been checking and have detected that while there are some regions with values not increasing, these are likely due to some errors in reporting in the source data (will check this). Also, these regions do not match those that you mention (except for some states in the US).

Could you verify that you are using updated data? Note that you can directly load from the repo URL.

The code below obtains those regions that present non-increasing values, could you please check?

Input:

def check_has_downs(x):
    """Check if iterable x is not monotonically increasing."""
    x = x.diff()
    return ((x < 0).sum() > 0)

# Load data
url = "https://github.com/sociepy/covid19-vaccination-subnational/raw/main/data/vaccinations.csv"
df = pd.read_csv(url)
# Get regions that present non-increasing values day-to-day
dfg = df.groupby(["location", "region"]).agg({"total_vaccinations": lambda x: check_has_downs(x)})
dfg = dfg[dfg["total_vaccinations"] == True].reset_index()
dfg = dfg.sort_values(["location", "region"])
dfg

Output:

         location                                 region
0       Argentina                                Cordoba
1       Argentina                                Formosa
2       Argentina                                  Jujuy
3       Argentina                                Mendoza
4       Argentina                               Misiones
5          Brazil                                  Bahia
6          Brazil                           Minas Gerais
7          Brazil                                Sergipe
8           Chile                            Antofagasta
9           Chile  Libertador General Bernardo O'Higgins
10          Chile                                  Maule
11        Denmark                                 Others
12        Germany                                 Bayern
13        Germany                 Mecklenburg-Vorpommern
14          India                          Daman and Diu
15  Liechtenstein                                      -
16         Norway                              Innlandet
17         Norway                      Troms og Finnmark
18          Spain                              Catalunya
19          Spain                                  Ceuta
20          Spain                          Illes Balears
21  United States                               Arkansas
22  United States                                 Hawaii
23  United States                         South Carolina
24  United States                                   Utah
25  United States                               Virginia
lucasrodes commented 3 years ago

Hi @hef-abd, is the problem still persisting?

lucasrodes commented 3 years ago

Closing due to inactivity, please feel free to re-open