nychealth / coronavirus-data

This repository contains data on Coronavirus Disease 2019 (COVID-19) in New York City (NYC), from the NYC Department of Health and Mental Hygiene.
https://www1.nyc.gov/site/doh/covid/covid-19-data.page
954 stars 649 forks source link

Disparity in NYC deaths between NYC DOHMH, NY State, & Johns Hopkins #27

Closed elash20 closed 3 years ago

elash20 commented 4 years ago

I had found that for the most part the three sources all more or less agreed with each other until around a week ago, where NYC total cases reported by NY State and Johns Hopkins began to grow at a faster rate than cases reported by NYC's DOHMH. However, starting yesterday evening I noticed that NY State and Johns Hopkins suddenly were reporting far more deaths than NYC, and the gap grew by this evening (2,738 vs 3,485). I do not believe ~750 people were lost in a gap of a day or a few hours. Does anyone know anything about whether they are using different standards in confirming the deaths as COVID-19 related, or have any other explanation?

DTPOTO commented 4 years ago

Hello @elash20, I believe this because NYC about a week ago started to use the Date-of-Interest. In this case that would be the actual Date-of-Death vs the Reporting-Date. My speculation is that the date of death is causing back-dating. I don't actually have a good reason for this switch other than its more accurate in the long run. I suspect that you will find TOTAL deaths to be comparable between the systems. That being the case then you will not find out about Deaths that occurred today until next week. Look at the conversation that I had with @joansobo on this subject under Issue #1 "counts data differ from yesterday". The backdating has caused for a lot of confusion. Comparing the multiple reporting files of Case-Hosp-Deaths should show where 750 deaths were re-assigned.

elash20 commented 4 years ago

Hi DTPOTO, I have actually ONLY been comparing total deaths. What confuses me is that NYC's most recent reporting states that the data reflect events up to April 6 at 5:30 PM, yet at that time the cumulative deaths reported by NYC is far below that of other sources. Is that descriptor not to say that all deaths that occurred and were processed within NYC are in the count of total deaths? There's no narrative piece that indicates this back-dating. If the 'Date-of-Death' occurred on or before April 6 at 5:30 PM (or however many could be confirmed and processed up to that point) wouldn't they be included? Is there any public confirmation by NYC DOHMH that they are using that methodology or any other? Even if cases or deaths are reassigned to prior dates, shouldn't the cumulative for today remain the same?

Also, if they are creating a reporting lag via backdating, would that not say that the most current cumulative total would be the higher numbers reported by Hopkins or NY State? Sorry if this was explained previously, but even with backdating I don't understand the disparity between totals reported that's grown the last 3 days.

DTPOTO commented 4 years ago

You right @elash20, the Total should not change. The only thing that I can think of is that NYC DOHMH is making a unilateral decision to reassign the deaths to other geographies either outside the county (e.g. Westchester) or outside the state (e.g. NJ). That would now leave the questions is anyone acknowledging those deaths. I think if NYC DOHMH is classifying the deaths of non-city residents differently they should spell that number out. After all this is going to be an issue for all big cities with the premium health centers. @mmontesanonyc @joansobo

elash20 commented 4 years ago

Hi DTPOTO, looking at files from NY State I found that they distinguished between how many deaths occurred at hospitals in each county, versus how many deaths were attributed to RESIDENTS of each county. I'll compare the numbers to see if the difference is resolved by this, or if there seem to be other factors.

kdog0000 commented 4 years ago

There is an unaccounted death amount from non-hospitalized cov-19 patients. The unaccounted cov-19 death occurring at residences, care facilities, prisons, are not added to the John Hopkins amounts. These deaths are not tested post morgue or by EMT for Cov-19, therefore not added to the overall count. But they could be identified in the disparity between state overall deaths numbers/spikes/trends vs cov-19 logged deaths. Death trends comparing same month over 10 year period vs cov-19 spike, vs non-recorded cov-19 deaths, may identify the unrecognized cov-19 death trending amounts.

briankoral713 commented 4 years ago

JHU and other sources are far behind in reporting - only the NYC DOH site (really only this GitHub) is the closest resource to the actual data.

elash20 commented 4 years ago

@briankoral713, why would JHU and other sources report higher total numbers for NYC than NYC DOH is, then? Either they're ahead, or they're using different standards for NYC Deaths.

elash20 commented 4 years ago

@kdog0000 that's really interesting. So the Johns Hopkins COVID-19 death counts seem to have started following the NY State COVID-19 death counts since the divergence (they are in agreement, with Johns Hopkins normally updating to match the state's numbers within a couple hours of the state releasing their update), with both sources reporting higher total cases and total deaths within NYC than the NYC DOH.

The state is counting COVID-19 specific cases and fatalities here, broken down by county, with deaths broken down by county the death occurred in and by county the deceased resided in. I would assume that any individual that tested positive for COVID-19 and dies is included in the state count (whether they passed in a hospital, residence, prison, etc.). Furthermore, I'm unsure why the state would report higher total cases for NYC than the NYC DOH would, unless this is also a question of testing location vs county the patient is a resident of.

Both Johns Hopkins and NY State have made their data publicly available and update at least once a day, and once I have time I plan on comparing the day by day reports in light of these considerations. If anyone knows the criteria for how positive cases or COVID-19 related deaths are counted by each institution, please let me know.

Edit: NY State only has its positive case dataset available as far as I can tell, but I'm hoping the fatality data will be public soon.

kdog0000 commented 4 years ago

Here is an example of Italy’s recent historical death trend before the recent cov 19 death spike. What’s shockingly revealed is that overall deaths (in the graph it shows March) glean an obvious undercount of Covid 19 cases. Since morgues, EMT, have no testing ability, deceased non admitted patients likely would show a same trend in the U.S. My point is Cov19 deaths, anywhere but a hospital, are being overlooked and not counted. The same methodology used in the chart could be used currently to account for a truer cov 19 death count.

https://www.corriere.it/politica/20_marzo_26/the-real-death-toll-for-covid-19-is-at-least-4-times-the-official-numbers-b5af0edc-6eeb-11ea-925b-a0c3cdbe1130.shtml

57110803-897A-45A4-B54C-F757EA92C1A4

DTPOTO commented 4 years ago

Hello @elash20, I have look at the Death discrepancy between NYS and NYC. It does look like NYC is behind (if not under-reporting). You noticed the discrepancy once the cIty started to use GitHub, more importantly stated to use "Date-of-Death" instead of "Date-Reported". Allow me to put forth an "Administrative Delay" hypothesis. Now that the City is trying to go by Date of Death, they may be waiting for the Death-Certificate (coming through the State). Before they may have been taking the data directly from the hospital (or county) as simply a death count for the day. You may want to watch things with that point of view, in that the City is behind the state as opposed to under-reporting. The question would be much of a Lag would the city have before reporting. I believe that you already said it seems to be more than a Day behind. If it is an administrative delay then I would expect some kind of catch-up during the work week. I have to say 800 is a lot to be behind for administrative work.

image

elash20 commented 4 years ago

@DTPOTO, Di Blasio in a press release today: "NYC Mayor Bill de Blasio gives a press briefing on the latest coronavirus developments in the city. Earlier this morning, de Blasio said that the city is undercounting the number of people dying from coronavirus by only reporting deaths in hospitals and not those who die at home. He told local reporters ‘that needs to be in the statistics. What I’ve said to our health care experts is we should just acknowledge this is overwhelmingly being driven by the coronavirus… Not every death, but clearly the vast majority are related to the coronavirus, we should count them as part of the overall, very painful, count.’ As of this morning, over 4,000 people have died in New York City."

DTPOTO commented 4 years ago

Good job @elash20. You have it. Roughly 20% of the people diagnosed with coronavirus are dying at home.

This somewhat is alignment with what @kdog0000 was suggesting. Although, I believe that @kdog0000 is also suggesting that many people will never be diagnosed with Coronavirus but will die of the disease. This latter issue is a retrospective study that needs be be done sometime in the future (after the crises is over).

By the way, I was looking at the backdating issue and I was coming to the conclusion that all the backdating could not account for the discrepancy.

mmontesanonyc commented 4 years ago

The reported case counts from all three organizations will not match because of different data sources and cleaning procedures.

DTPOTO commented 4 years ago

Hello @elash20: The City has reported a lot of deaths the past two days. Many of the deaths are back-dated due to Date-of-Death. But the disparity with the state is still around 900. I believe you are correct about the Deaths at home. I don't think the death rate increase in the last two days is including the Death at Home.

image

image This chart is based on the comparing multiple versions of the Case-Hosp-Death.csv in this Site.

DTPOTO commented 4 years ago

Hello @elash20 It looks like the system is playing catch up. Last week there was a lot of back-dating going on with regards to Date-of-Death.
image

The piece that concerns me is that the backdating went all the way back to early March. The assignments that happened on the Reporting date of 4/9 (Pink). I suspect the missing 700+ Deaths will show up over the next week and that they will appear between 3/14 and 3/26. The regular backdating seems to be rather automated since the beginning of april going back 1-10 days.
image

I am concerned about all those deaths showing up in the system on 3/13. The spike on 3/13 may be an administration issue. But all these deaths occurring in early to mid-march there is a major disconnect with the identified New-Case at that time date. Basically the testing process it identify for new cases takes about 5-10 days which seems to be longer than deaths to start mounting. Basically, Deaths & Hospitalizations were out pacing New Cases! Look at the early days on the Case-Hosp-Death.csv file.

drebich commented 4 years ago

Is there a chart or data comparing deaths from covid19 to the seasonal flu? I think this would be interesting since all are under social distancing guidelines. Including deaths from pneumonia

rak5381 commented 4 years ago

Some seasonal influenza data in NY can be found at https://nyshc.health.ny.gov/web/nyapd/new-york-state-flu-tracker and https://www.health.ny.gov/diseases/communicable/influenza/surveillance/ and https://a816-health.nyc.gov/hdi/epiquery/

drebich commented 4 years ago

I was really just wondering if anyone is comparing the data (seasonal flu deaths vs covid19) side by side. I think it would be relevant. I believe we are near 60k deaths from seasonal flu since January 1st, with social distancing. Sent via the Samsung Galaxy Note10+, an AT&T 5G Evolution capable smartphone -------- Original message --------From: rak5381 notifications@github.com Date: 4/19/20 10:36 AM (GMT-05:00) To: nychealth/coronavirus-data coronavirus-data@noreply.github.com Cc: drebich drebich@hotmail.com, Comment comment@noreply.github.com Subject: Re: [nychealth/coronavirus-data] Disparity in NYC deaths between NYC   DOHMH, NY State, & Johns Hopkins (#27) Some seasonal influenza data in NY can be found at https://nyshc.health.ny.gov/web/nyapd/new-york-state-flu-tracker and https://www.health.ny.gov/diseases/communicable/influenza/surveillance/ and https://a816-health.nyc.gov/hdi/epiquery/

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/nychealth/coronavirus-data/issues/27#issuecomment-616149540", "url": "https://github.com/nychealth/coronavirus-data/issues/27#issuecomment-616149540", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]