Closed lisphilar closed 3 years ago
At version CovsirPhy 2.3.0 with Italy data (as of 18Jun2021), example/scenario_analysis.py and 8 CPUs at my local environment, parameter estimation completed with RMSLE=0.07595 in 2 min 22 sec.
(Please ignore accuracy of the last phase of Forecast scenario because this is a forecasted future phase.)
Update: RMSLE score was fixed. 0.0795 -> 0.07595
I compared the performances, changing constant_liar
and timeout_iteration
with Italy data as of 18Jun2021, my local environment and CovsirPhy version 2.20.3-theta. I used only 1 CPU with n_jobs=1
to get robust values of runtime as total value of all phases. Parameter estimation of each phase was done seaquencially. Code are as follows.
import covsirphy as cs
loader = cs.DataLoader()
jhu_data = loader.jhu()
snl = cs.Scenario(country="Italy")
snl.register(jhu_data)
snl.trend()
snl.estimate(cs.SIRF, n_jobs=1)
print(f"RMSLE: {snl.score(metric='RMSLE')}")
Results are here.
RMSLE (runtime) | constant_liar=False | constant_liar=True |
---|---|---|
timeout_iteration=5 | 0.06810 (13 min 22 sec) | 0.06868 (17 min 42 sec) |
timeout_iteration=4 | 0.06812 (14 min 03 sec) | 0.06869 (14 min 07 sec) |
timeout_iteration=3 | 0.06808 (10 min 10 sec) | 0.06871 (10 min 31 sec) |
timeout_iteration=2 | 0.06811 (07 min 55 sec) | 0.06865 (07 min 11 sec) |
timeout_iteration=1 | 0.06806 (03 min 21 sec) | 0.06901 (03 min 53 sec) |
I expected constant_liar=True
and timeout_iteration=1
would show the best performance, but these results indicated constant_liar=False
and timeout_iteration=1
. I will create a pull request for constant_liar=False
and timeout_iteration=1
. These default values may be changed later if we get different results with the other countries' data.
With #833,
Scenario.estimate(<model>, timeout_iteration=1)
as default.constant_liar=False
explicitly.Later, I will add constant_liar=False
as an argument of Scenario.estimate()
, if necessary.
WIth #835, user can select whether use constant liar or not with Scenario.esitmate(<model>, constant_liar=False)
(default).
I compared RMSLE scores and runtime of constant_liar=False
(default at this time) and constant_liar=True
with some countries' datasets. I used example/scenario_analysis.py with 8 CPUs.
Results are here.
iso3 | Country | constant_liar=False | constant_liar=True | Better RMSLE | Better runtime | Winner |
---|---|---|---|---|---|---|
ita | Italy | 0.07642 (27 sec) | 0.07686 (29 sec) | FALSE | FALSE | FALSE |
jpn | Japan | 0.06103 (39 sec) | 0.06200 (44 sec) | FALSE | FALSE | FALSE |
grc | Greece | 0.05472 (37 sec) | 0.05107 (44 sec) | TRUE | FALSE | NA |
nld | Netherlands | 0.03719 (37 sec) | 0.03706 (28 sec) | TRUE | TRUE | TRUE |
usa | USA | 0.23073 (33 sec) | 0.24186 (22 sec) | FALSE | TRUE | NA |
ind | India | 0.21665 (36 sec) | 0.21871 (50 sec) | FALSE | FALSE | FALSE |
bra | Brazil | 0.06754 (53 sec) | 0.06634 (63 sec) | TRUE | FALSE | NA |
rus | Russia | 0.61374 (38 sec) | 0.61293 (28 sec) | TRUE | TRUE | TRUE |
Because there was no significant difference, we continue to use constant_liar=False
as default. For Netherlands and Russia, it will be better to use Scenario.estimate(cs.SIRF, constant_liar=True)
.
Runtime of parameter estimation will be quite shorter with timeout_iteration=1
(default). Version 2.21.0 release was planed in Jul2021, but this should be moved up to Jun2021. Tomorrow or within some days.
Summary of this new feature
Improve performance (estimation score and runtime) of parameter estimation with the following solutions.
optuna
provides new optionconstant_liar
ofTPESampler
at version 2.8.0. Constant Liar heuristic reduces search effort, avoiding trials which trys similar parameter sets. Please refer to their detailed explanations and discussions with Optuna version 2.8.0 release note. It will be great for CovsirPhy users to useconstant_liar=True
if Optuna version 2.8.0 is available in our environments.time_iteration
At version 2.20.3,Scenario.estimate(timeout_iteration=5)
is the default value. Estimation score (RMSLE as default) is calculated every five seconds and the socre was not changed fortail_n=4
iterations, estimation will be stopped and best parameter set will be returned. However, with my tests,timeout_iteration
appears to be a bottleneck. Many phases runs 5 seconds. (i.e. whentimeout_iteration
is shorter, runtime may be shorter.)Note regarding constant liar:
constant_liar
argument cannot be applied with Optuna version 2.7.0 or older. https://gist.github.com/lisphilar/6440b5d69c4984bb0b34ede8c8ebcca3TypeError
means we use Optuna version 2.7.0 or older. Whencovsirphy
getTypeError
withconstant_liar
argument, it should remove the arument and retry creatingTPESampler
.