Open kvas7andy opened 2 years ago
@kvas7andy Thanks for filing this issue with a detailed explanation. Could we split this into three separate issues to facilitate the discussion?
Hi @blumu surely, lets split into three. Only thing is I will get back to discussion tomorrow.
@kvas7andy Is your commit above addressing all three problems mentioned in this issue or just some of them? (By the way, if you could split them as separate bugs that would be helpful.) Many thanks!
I moved Issue 1
to a separate issue #115
Hi everyone,
Found several bugs while checking the code of ipynb notebooks with benchmark results for 3 environments TinyToy, ToyCTF, Chain.
I think my findings might be useful for community, who uses this nice implementation of cyberattacks simulation.
own_atleast_percent: float 1.0
is included as condition with AND, for raising flagdone = True
, thus for TinyToy and ToyCTF (not Chain) leads to long duration of training, wrong RL signal for evaluating Q function and low sample-efficiency.done
. This means inclusion ofown_atleast_percent: 1.0
in initialization of"v0"
versions oftoyctf
andtinytoy
environments and creation of new envs 'CyberBattleTiny-v1' and 'CyberBattleToyCTF-v1', by defaultown_atleast_percent=0
andown_atleast=6
. This is reasonable, due to the fact that CTF solution includes only 6 nodes to be owned and with correct reward engineering training stops at the attack, which owns 6 nodes with highest reward.