Closed vyaduvanshi closed 3 years ago
Hi @vyaduvanshi, Thanks for reporting.
Indeed there is a substantial difference when we compute the following:
>>> df[df.continent == 'Asia'].new_cases.sum()
41666800.0
>>> df[df.location == 'Asia'].total_cases.iloc[-1]
42490581.0
(Numbers differ from yours due to change in execution date).
In general, I found that some countries do not register the first entry for new_cases
(see Thailand for instance), this leads to some miss-adjustment if you apply the cumulative sum and compare with total_cases
.
However, most of the miss-adjustment is coming from Turkey. Apparently, on the 2020-12-10, no entry for new_cases
is registered. This has its origin here:
I'll investigate if this bug is fixed and that line is no longer needed.
My recommendation, if you need to work with cumulative values, use total_cases
column instead.
@lucasrodes Thanks for the extensive look into this. I will resort to using total_cases
for cumulative values.
As you can see, there are big variations in figures of vaccinations as well (unless I did something terribly wrong). The total_vaccinations
of continents adds up to more than total_vaccinations
of World
. Worth bringing your attention towards.
P.S. I read your code, I guess I don't understand how exactly this line helps.
df_c.total_cases.apply(lambda a: min(a, df_c[df_c.total_cases!=0].total_cases.min()))
Most of this discrepancy indeed comes from this very large data correction for Turkey in December:
The change is so large that it was distorting the 7-day average number of new cases not just for Turkey, but also for Asia and even World. So we decided to remove the daily difference from new_cases
.
For total_vaccinations
I don't think there's currently an issue. Our latest update shows:
World
: 1,370,323,701 total vaccinations
As is visible, a big chunk (823,781) is missing. What's the cause for this discrepancy?