Open timbrigham-oc opened 1 week ago
Hmm, I wonder if this it hitting the limits of the in-memory simple"database" Caldera uses. Do you have any profiling stats on the memory usage as well? I wondering if its constantly page swapping RAM.
*Ill admit, I dont think we have ever 50+ operations at 100+ steps.
Yeah, I could see that being a limiting factor. It's only (unusably) sluggish when there is an active operation and a bunch of historical data. I'm pretty sure my memory utilization was under 20% when I viewed it in top but no screenshot for proof. :)
It's also definitely something single threaded in Python that's getting caught up. Only one of the multiple cores in my test instance will get pegged to 100%. Didn't make sense at first since the two core machine was only reporting ~55% total in the Azure console.
I'll include more details when I end back up in the same situation. Gotta love iterative development on a process that uses lateral movement.. Blows up these counts in a hurry.
Description My Ubuntu instance is seeing high CPU utilization from the Python instance running Caldera. This gets much more noticeable when there are a substantial number (~50 previously run operations in my testing) present, to the point where I get agent communication timeouts.
To Reproduce
Testing Restarting the Caldera server process does not help, and will (fairly quickly) return to the same CPU utilization patterns. I have a small API script which lets me bulk remove previous operations.. Removing old runs decreases CPU utilization.
Expected behavior Formerly executed engagements should not have an impact on CPU utilization for ongoing processes. I am guessing that while the operation is being executed links from former operations are still being evaluated and consuming CPU cycles, or something similar.
Environment My test instance is based on the 5.0.0 tagged release, and includes a few customizations - https://github.com/mitre/magma/pull/55 https://github.com/mitre/magma/pull/53 https://github.com/mitre/magma/pull/60