timriffe / covid_age

COVerAGE-DB: COVID-19 cases, deaths, and tests by age and sex
Other
56 stars 30 forks source link

observed jumps in Maine cases #61

Open mpascariu opened 3 years ago

mpascariu commented 3 years ago

Hi @timriffe,

I am looking at the confirmed cases for Maine state and I see periods with significant jumps. I think it is an isolated event. This might need some attention.

> read_csv(
+   file = "data/Output_10_20201208.zip",
+   skip = 3)%>% 
+   mutate(
+     Date = as.Date(Date, format = "%d.%m.%Y")) %>% 
+   filter(Sex == "b",
+          Region == "Maine",
+          Age == 60) %>% 
+   arrange(Date) %>% 
+   ggplot(aes(x = Date, y = Cases)) + 
+   geom_line(size = 1) + 
+   labs(title = "Confirmed cases in the 60-70 age group")

Rplot

Looking at weekly no of cases per 100k inhabitants we would see this:

C19_Cases_dev_Maine_20201209

mpascariu commented 3 years ago

In fact it is not an isolated event. I can see this in California and Florida too.

mpascariu commented 3 years ago

Is it possible to be a date formatting issue?

timriffe commented 3 years ago

Thanks for reporting Marius, your observations have been reported to the respective collectors. Date formatting is a possibility. Will let you know as soon as it's fixed.

mpascariu commented 3 years ago

Thanks Tim! Here's a view over all US states:


library(tidyverse)

p <- read_csv(
  file = "data/Output_10_20201208.zip",
  skip = 3)%>% 
  mutate(Date = as.Date(Date, format = "%d.%m.%Y"),
         Age = as.factor(Age)) %>% 
  arrange(Date) %>%
  filter(Sex == "b", 
         Country == "USA", 
         # Age %in% c(60, 70, 80),
         Cases > 0) %>%
  ggplot(aes(x = Date, y = Cases, color = Age)) + 
  geom_line(size = 1) + 
  facet_wrap(~ Region, scales = "free", ncol = 3) + 
  scale_y_continuous(labels = scales::label_number_si(accuracy = 0.1)) +
  labs(title = "Monotonicity of confirmed cases, USA") +
  theme(legend.position = "top")

ggsave("chart.png", p, width = 8, height = 18)

chart

timriffe commented 3 years ago

OK, this is a good diagnostic, going through one by one. Making a checklist.

mpascariu commented 3 years ago

Hi @timriffe, I can see that most of the data for the states of Iowa, California and Washington disappeared altogether from 07-01-2021 version of the database. Only few weeks of data for each state is left. Was that done on purpose?

timriffe commented 3 years ago

Thanks for reporting! Not on purpose. I'm investigating these one at a time.

mpascariu commented 3 years ago

California and Washington look good on 08-01-2021, however Iowa data still displays major gaps between June and September.

On December 9 I was able to produce this: C19_Cases_dev_Iowa_20201209

Today I can see this: C19_Cases_dev_Iowa_20210108

timriffe commented 3 years ago

Thanks @mpascariu I did a manual roll-back yesterday in Drive, as automatic captures had been failing for Iowa. Looks like I chose the wrong date. I've been in contact with the source, who tells me the sheet will be released again soon. This will completely overwrite the Iowa series, FYI. It could be a few days before that makes it through. I'll therefore roll back to the sheet status the day prior to Dec 9 and hopefully you'll get that same data back.

On Fri, Jan 8, 2021 at 12:03 PM Marius D. Pascariu notifications@github.com wrote:

California and Washington look good on 08-01-2021, however Iowa data still displays major gaps between June and September.

On December 9 I was able to produce this: [image: C19_Cases_dev_Iowa_20201209] https://user-images.githubusercontent.com/6264977/104008213-6de55100-51a9-11eb-8b22-f51246068547.png

Today I can see this: [image: C19_Cases_dev_Iowa_20210108] https://user-images.githubusercontent.com/6264977/104008262-7f2e5d80-51a9-11eb-9343-318b73b61df0.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/timriffe/covid_age/issues/61#issuecomment-756695535, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG43G64IAXRVROHIP3AEX3SY3RBLANCNFSM4UTGXRQA .

mpascariu commented 3 years ago

ok, great!

mpascariu commented 3 years ago

The monotonicity issues can be extended at the country level for the entire database not only for the US regions. This issue has been spotted in various countries.

But maybe a new issue should be open for this (?)