Growth factor - Githubissues

lueck commented 4 years ago

I added a function for calculating the growth factor (german: Vervielfaeltigung pro Tag, bestandsspezifischer Wachstumsfaktor) in RKI data which have been restructured with group_RKI_timeseries() before. It is based on KumAnzahlFall.

The function works on arbitrary groupings made with group_RKI_timeseries().

How it is implemented:

1) Calculate the cartesian product of the restructered data and the same data. (cf. self cross join).
2) Filter rows where Meldedatum.x == Meldedatum.y + 1 Day
3) Calculate the quotient KumAnzahlFall.x / KumAnzahlFall.y
4) Replace Inf and NaN values with NA.
5) left join to add new column to restructered data.

For performance reasons, an inner join is used instead of a cross join where possible. The inner join also allows calculation for the growth factor for arbitrary groupings made with group_RKI_timeseries() using the by argument.

mlange-42 commented 4 years ago

This can be acomplished more easily, without joining different days, and without having to care for "cuts" between spatial units:

time_series$RateTag <- time_series$AnzahlFall / (time_series$KumAnzahlFall - time_series$AnzahlFall)
time_series$DopplTage <- log(2) / log(1 + time_series$RateTag)
time_series_clean <- time_series
time_series_clean$DopplTage[is.na(time_series_clean$DopplTage) | (time_series_clean$DopplTage > 100)] <- NA

This is because the cumulative cases the day before can be canculated from cumulative minus new cases.

EDIT: Above is the growth rate. The factor would be

time_series$KumAnzahlFall / (time_series$KumAnzahlFall - time_series$AnzahlFall)

lueck commented 4 years ago

Yes, I see. I must have been blind coming from JHU data.

The solution you describe is even more robust to changes of the interval of data acquistition, while mine requires exact intervals of 86400 seconds.

nevrome / covid19germany

Growth factor #14