Closed lisphilar closed 1 year ago
DataEngineer
class was created.
data
argument) is a dataframe of geographic time series data which has reset index and columns defined by date
and layers
(location layers, like ["Country", "Province]
) and variables, including the number of cases.# Data preparation
import covsirphy as cs
loader = cs.DataLoader()
loader.jhu()
raw_df = loader.locked.copy()
# Set-up DataEngineer instance
engineer = DataEngineer(data=raw_df, layers=["ISO3", "Province"], date="Date")
# Data cleaning
engineer.clean(kind=["convert_date", "resample", "fillna"])
# Calculate daily new case
engineer.diff(column="Tests", suffix="_diff", freq="D")
# Data complement of S, F, R, Tests
col_dict = dict(population="Population", confirmed="Confirmed", fatal="Fatal", recovered="Recovered", tests="Tests")
procedures = engineer.complement_assess(address=["Japan", "Tokyo"], col_dict=col_dict)
engineer.complement_force(address=["Japan", "Tokyo"], procedures=procedures)
# Data transformation: calculate Susceptible, Infected, specifying column names
engineer.transform(new_dict={"susceptible": "Susceptible", "infected": "Infected"}, **col_dict)
# Get processed data (pandas.DataFrame)
engineer.all()
Codes of .complement_acess
and .complement_force
are from JHUDataComplementHandler
and PCRData
class. Tests of the new methods should be updated with some countries' data later. JHUDataComplementHandler
class could be deprecated.
With #1090,
DataDownloader
: use datatable
library for speed-up of data downloadingDataEngineer
: continue to use legacy complement handlerDataEngineer
: includes EDA functionalityDataEngineer().reocvery_period()
to calculate recovery periodDataEngineer().transform()
and DataEngineer().inverse_transform()
: mutual conversion of infected, population and confirmed, fatal+recoveredDataEngineer().register(data)
DataEngineer().add()
, .mul()
, sub()
, .div()
, adssign()
method to assign new columnsNew documentation will be added to https://github.com/lisphilar/covid19-sir/tree/master/example
Summary of this new feature
New class
DataEngineer
will be created for data cleaning, transforming and complementing.JHUData
class and so on handles internaly at the current version, but it is required to improve transparency of these procedures.CovsirPhy version: 2.24.0