guardicore / monkey

Infection Monkey - An open-source adversary emulation platform
https://www.guardicore.com/infectionmonkey/
GNU General Public License v3.0
6.58k stars 767 forks source link

ETE tests fail #3117

Closed mssalvatore closed 1 year ago

mssalvatore commented 1 year ago

Describe the bug

Docker ETE tests fail (authentication issue) AppImage fails to exploit some hosts

Tasks

mssalvatore commented 1 year ago

The problem here is the refactoring of HTTPClient. When we moved from tunneling to relays, we decided that we shouldn't retry the initial connection to the Island API server. If you look at the HTTPClient.connect() method in v2.0.0, you'll see it uses requests.get(), but in the new code we removed connect() and so now it's using session.get(), which has retries. this means connection to island servers is retried 4 times and our keep_tunnel_open times are too low.

Increasing keep_tunnel_open times is not a good solution. We should find a way to skip retries on initial connection. Connections are tried in parallel. For example, if we try 5 servers and 4 fail but 1 succeeds, we return the 1 successful. If retries are enabled, the 4 that failed will retry (a few times), even though we already know we have a successful connection to the Island. This delays the agent's startup because the worker pool won't return until all workers have completed.