Use of statistics from Zhou et. al.

johnurbanik commented 4 years ago

As mentioned in https://github.com/thibautjombart/covid19_bed_occupancy/issues/40, I'm very worried about the validity of the Zhou et. al. statistics, here and in general across the epidemiological modeling space right now. I'm not sure that the experimental design leads to a result where the length of hospitalization is valid. The experiment is still useful in terms of understanding co-morbidities, but I think it may be necessary to find a different source for length of stay information.

I'd really appreciate if you all take a look. I'm hoping I'm wrong here, as it seems that a lot of modeling is using the medians from this study (and using it as a mean, not even a distribution like you guys are). Perhaps another study shows similar results.

Quoting the study:

Since these two hospitals were the only designated hospitals for transfer of patients with COVID-19 from other hospitals in Wuhan until Feb 1, 2020, our study enrolled all adult inpatients who were hospitalised for COVID-19 and had a definite outcome (dead or discharged) at the early stage of the outbreak.

First, due to the retrospective study design, not all laboratory tests were done in all patients, including lactate dehydrogenase, IL-6, and serum ferritin. Therefore, their role might be underestimated in predicting in-hospital death. Second, patients were sometimes transferred late in their illness to the two included hospitals. Lack of effective antivirals, inadequate adherence to standard supportive therapy, and high-dose corticosteroid use might have also contributed to the poor clinical outcomes in some patients. Third, the estimated duration of viral shedding is limited by the frequency of respiratory specimen collection, lack of quantitative viral RNA detection, and relatively low positive rate of SARS-CoV-2 RNA detection in throat-swabs.37 Fourth, by excluding patients still in hospital as of Jan 31, 2020, and thus relatively more severe disease at an earlier stage, the case fatality ratio in our study cannot reflect the true mortality of COVID-19.

The two biggest red flags for me are that 'patients were sometimes transferred late in their illness to the two included' hospitals. The fact that the study includes 'time from illness onset to death discharge' >>> 'hospital length of stay' combined with the graphics in figure 1 and 2 suggest that a large percentage of the patients were hospitalized for other reasons before transfer (how else would they have labs for these patients and why else would they suspect such a large infectious time before hospitalization?).

I expect that the bar for admission in most hospitals will be higher than just fever and a positive test (i.e. I've heard reports that in some cases in New York, those without dyspnea are sent home, sometimes without testing), but it still remains that dyspnea is median 4 days earlier in the dataset than hospital admission (which is inexplicably in Table 1 and not copied into Table 2).

Further, the fact that patients who did not have a definite outcome were not included in the sample means that patients who were admitted before Jan 19 but were not yet discharged would skew the distribution toward a longer hospitalization time.

If these two variables interact it is possible that the actual distributions we'd see could have a substantially larger median than suggested (or at least a fatter tail). I hope that I am wrong.

johnurbanik commented 4 years ago

I performed some analysis using the empirical dataset from Wuhan over the later stages of the initial outbreak. It isn't my best analysis (I tried to get it out quickly after work and I'd welcome anyone contributing), but the data certainly seems to point to the data from Zhou et. al. being pretty far off in terms of the mean and tails of the distribution asymptotically.

Please feel free to take a look.

https://github.com/understand-covid/proposal/blob/master/parameter%20estimation/hospital_stay_analysis.ipynb

thibautjombart commented 4 years ago

This is pretty useful, thanks! I think we will:

look for other distributions in the litterature and add them as options, if possible
implement the user-defined distributions as outlined in https://github.com/thibautjombart/covid19_bed_occupancy/issues/9

How does that sound?

thibautjombart commented 4 years ago

Closing to follow on https://github.com/thibautjombart/covid19_bed_occupancy/issues/49 and https://github.com/thibautjombart/covid19_bed_occupancy/issues/9

thibautjombart / covid19_bed_occupancy

Use of statistics from Zhou et. al. #47