k-sys / covid-19

A collection of work related to COVID-19
1.38k stars 432 forks source link

Loading Patient Info no longer works with latest data #62

Open jqnatividad opened 4 years ago

jqnatividad commented 4 years ago

The date validation logic below:

# Convert both to datetimes
patients.Confirmed = pd.to_datetime(
    patients.Confirmed, format='%d.%m.%Y')
patients.Onset = pd.to_datetime(
    patients.Onset, format='%d.%m.%Y')

# Only keep records where confirmed > onset
patients = patients[patients.Confirmed >= patients.Onset]

fails because of some invalid dates in the latest version of the data

Only the data up to May 13 works.

Further, the data file is also gzipped because of GH limits and the notebook needs to be updated to handle this.

tuchandra commented 4 years ago

There's a date in the Onset column entered as 31.04.2020, which doesn't exist - not super sure what it's supposed to be (perhaps 31.03.2020), but it's just two rows and you could drop them without it mattering much.