nytimes / covid-19-data

A repository of data on coronavirus cases and deaths in the U.S.
https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Other
6.99k stars 3.46k forks source link

Harris County, TX Public Health Department is significantly different than the data in the state database for the 'same' county #668

Closed MirrorCola closed 2 years ago

MirrorCola commented 2 years ago

I can understand if you have no answer to this question, but if you have come across this issue, I'd appreciate any insight. The data published by the Harris County, TX Public Health Department is significantly different than the data in the state database for the 'same' county (It's clearly not just a subset), do you know why??

lwaananenjones commented 2 years ago

For Harris County, we monitor data from both the county and the state, and the state source has consistently reported a higher number of cumulative cases and deaths for quite a while now. One difference is that the county lists "confirmed cases" while the state includes both confirmed cases and probable cases. (We have information about the two categories in our readme file or our FAQ page.)

Our data includes probable cases where available, which is most geographies at this point, so we use the state source in our data. Other common differences are different ways of displaying cases over time; our data is by the data cases and deaths are added to the cumulative total, but health departments often backdate cases to the date a person was tested or the date the case was reported to the health department. Texas announced a backlog of older in Harris County last week, for instance, so in our data those cases are included on the date they were added with a note explaining the backlog.

Let me know if I'm misunderstanding the differences you're seeing.

MirrorCola commented 2 years ago

Thank you very much for responding and the information about the differences.

1) The death counts are also quite different and phased differently. Is the confirmed/probable difference also true for deaths? The cumulative death counts are quite different as is the timing.

2) FYI, the Harris County Health Department splits newly reported daily cases between RECENT (< 14 days) and OLD (14+ days). My attempts to rationalize this data with the State reports have not been successful and the timing is drastically different. They do not publish data with the same granularity.

3) Is the state getting data directly from hospitals and other facilities rather than through the health departments of respective counties?

Again, thank you in advance for all of your efforts and for taking the time to respond.

lwaananenjones commented 2 years ago

Texas uses vital records (deaths certificates) for counting deaths. This is generally considered the most accurate way to count deaths, but there is a lag since it takes some time for the paperwork to be processed and checked. The CDC maintains a separate dataset of Covid deaths based on death certificates, which currently shows 10,368 deaths. This tracks with the state's count (10,854) since the CDC updates its counts more slowly.

Texas has a decentralized health department system with complicated relationships between state and local health officials, which we wrote about in 2020. The flow of information has generally improved since then, but I'm not certain about any current specifics about Harris County.

MirrorCola commented 2 years ago

Thank you again (to all of you) for all of the work you're continuing to do and, in particular, for answering my question. I thought I had a handle of the differences until they seem to have gotten worse again lately. I'll continue to monitor both data sets.

MirrorCola

tiffehr commented 2 years ago

Thank you, @lwaananenjones! @MirrorCola, we sympathize about the complications within local and state figures!