microCOVID / microCOVID

Estimating the COVID risk of ordinary activities
https://www.microCOVID.org
MIT License
269 stars 55 forks source link

Data for Canada health regions in British Columbia are all zeros #1509

Open justinhaaheim opened 2 years ago

justinhaaheim commented 2 years ago

Describe the bug image

0 cases reported in the last week for Canada > BC > Interior

Link to microCOVID scenario

https://www.microcovid.org/?distance=sixFt&duration=120&interaction=oneTime&personCount=15&riskProfile=average&scenarioName=bar&setting=indoor&subLocation=Canada_British_Columbia&subSubLocation=Canada_British_Columbia_Interior&theirMask=none&topLocation=Canada&voice=loud&yourMask=none

apiology commented 2 years ago

Hey @jeanpaulrsoucy, any idea what might be going on here? Here's an example where cases_daily returns 0:

$ curl 'https://api.opencovid.ca/summary?loc=BC&ymd=true&before=2022-07-18&after=2022-07-02' | jq . | tail -40
############################################################################ 100.0%
      "vaccine_administration_dose_1": 4524033,
      "vaccine_administration_dose_1_daily": 0,
      "vaccine_administration_dose_2": 4357594,
      "vaccine_administration_dose_2_daily": 0,
      "vaccine_administration_dose_3": 2446320,
      "vaccine_administration_dose_3_daily": 0
    },
    {
      "region": "BC",
      "date": "2022-07-18",
      "cases": 375938,
      "cases_daily": 0,
      "deaths": 3823,
      "deaths_daily": 0,
      "hospitalizations": 426,
      "hospitalizations_daily": 0,
      "icu": 34,
      "icu_daily": 0,
      "tests_completed": 6110205,
      "tests_completed_daily": 0,
      "vaccine_coverage_dose_1": 86.75,
      "vaccine_coverage_dose_1_daily": 0,
      "vaccine_coverage_dose_2": 83.56,
      "vaccine_coverage_dose_2_daily": 0,
      "vaccine_coverage_dose_3": 46.91,
      "vaccine_coverage_dose_3_daily": 0,
      "vaccine_coverage_dose_4": 5.18,
      "vaccine_coverage_dose_4_daily": 0,
      "vaccine_administration_total_doses": 11598028,
      "vaccine_administration_total_doses_daily": 0,
      "vaccine_administration_dose_1": 4524033,
      "vaccine_administration_dose_1_daily": 0,
      "vaccine_administration_dose_2": 4357594,
      "vaccine_administration_dose_2_daily": 0,
      "vaccine_administration_dose_3": 2446320,
      "vaccine_administration_dose_3_daily": 0
    }
  ],
  "version": "2022-07-19 18:08 EDT"
}
$ 
jeanpaulrsoucy commented 2 years ago

BC's case data were most recently updated to the date 2022-07-09. (I believe we'll get the next seven days of data tomorrow) You can see this by using the timeseries route rather than summary:

https://api.opencovid.ca/timeseries?geo=pt&loc=BC&stat=cases

The summary route fills dates up to the max date by default (can be emulated in the timeseries route using fill=true.

You're probably looking for this: return latest x days of data. I can prioritize adding this if it would be useful.

By the way, the ymd parameter on the summary route is now deprecated (has no effect in the current version of the API)---all dates are now returned in ISO8601 format by default.

jeanpaulrsoucy commented 2 years ago

ccodwg/Covid19CanadaAPI#45 has been added by ccodwg/Covid19CanadaAPI@56d40cc380d5b75fd23fb8bcae8c79b07e762110.

apiology commented 2 years ago

Thanks as always for the lightning-fast responses, @jeanpaulrsoucy! That all makes sense.

However, is it possible that fix broke the existing endpoint? I now see this behavior with the same URL:

$ curl 'https://api.opencovid.ca/summary?loc=BC&ymd=true&before=2022-07-18&after=2022-07-02'
{"data":[],"version":"2022-07-20 14:08 EDT"}$ 
jeanpaulrsoucy commented 2 years ago

Oh, let me fix that.

jeanpaulrsoucy commented 2 years ago

Hi @apiology, your query should now return the expected results.

apiology commented 2 years ago

Thanks for the fix, and also for the feature addition!

Sorry for the slow replies - I'm working on understanding how this part of our ingestion script works. I am not the original author, so it's somewhat slow going.

I believe the script suffers this symptom when faced with stale data from other data sources, as well. I suppose this is becoming more apparent with the world-wide trend towards laggy data. If our script can be made more flexible about what dates it ingests and joins with other sources, the more-recent-data feature seems like it'd fit right in!

davidrbrake commented 2 years ago

This still seems to be a problem - at least in Newfoundland and Labrador. I suggest this organization https://covid19resources.ca/about-us/ would be happy to work with you to help identify the most current resources and connect them...

image
apiology commented 2 years ago

My hypothesis is that reporting (from the government side) is often happening less frequently than our analysis window, @davidrbrake, which would imply that we would need a fix on our side to widen that window while still reporting useful numbers for folks. If true, I don't believe that issue is specific to opencovid.ca or to Canada.

Do you have reason to think that the problem is due to data issues specific to opencovid.ca?

jeanpaulrsoucy commented 2 years ago

Actually, the issue here is a bit simpler: Newfoundland no longer reports case data at the health region level. The only health region category with newly added case data is “Unknown”. See below.

https://api.opencovid.ca/timeseries?stat=cases&geo=hr&loc=nl&date=1&hr_names=short

Original data source is here: https://experience.arcgis.com/experience/280d17f9bd5d47e9870b6aba8222e5f4

A few other provinces are like this too now.

apiology commented 2 years ago

Actually, the issue here is a bit simpler: Newfoundland no longer reports case data at the health region level. The only health region category with newly added case data is “Unknown”. See below.

Ah, got it. Sounds like the fix there is to stop listing health regions as available drop-down options under Newfoundland.

A few other provinces are like this too now.

Do you have a list, by chance?

jeanpaulrsoucy commented 2 years ago

I can certainly make one. It would be useful to add to the README on the main repository. I’ll work on it tomorrow.

jeanpaulrsoucy commented 2 years ago

Hi @apiology, I've rewritten our README to be more useful to end users.

https://github.com/ccodwg/CovidTimelineCanada

More relevant to you, I've also added a list of provinces/territories no longer reporting health region-level data:

Some provinces no longer offer health region-level data for cases and/or deaths. For these provinces/territories, all recent cases and/or deaths will show up under the "Unknown" (code: 9999) health region. The following is a list of provinces/territories that no longer report health region data:

  • Manitoba (death data no longer reported at HR-level)
  • Newfoundland and Labrador (case data no longer reported at HR-level)
  • Northwest Territories (case and death data no longer reported at all)
  • Nova Scotia (case and death data no longer reported at HR-level)
  • Nunavut (case and death data no longer reported at all)
  • Saskatchewan (case and death data no longer reported at HR-level)

So for case data, you should remove the health region option for Newfoundland and Labrador, Nova Scotia and Saskatchewan. You can also remove Northwest Territories and Nunavut, since they no longer report case data at all.

apiology commented 2 years ago

Fantastic - thanks so much, @jeanpaulrsoucy!

apiology commented 2 years ago

If someone is looking to pick up this fix, there are some ideas on how to handle these more automatically as part of a general strategy around "county-level" data which does not have recent enough data - see the discussion here for a couple of ideas. If it helps get something out the door and build up some experience with the codebase, doing something small and focused on the above regions would be fine by me too.