open-duelyst / duelyst

Duelyst is a digital collectible card game and turn-based strategy hybrid, developed by Counterplay Games.
Creative Commons Zero v1.0 Universal
3.62k stars 556 forks source link

[P0] Memory allocation failures in API #173

Closed willroberts closed 1 year ago

willroberts commented 1 year ago

Summary

Sometime between 1.97.2 and 1.97.3, a change was introduced which appears to be resulting in memory contention in the API service. Increasing the available memory from 350 MB to 500 MB doesn't appear to have helped:

The Worker service also appears to be impacted, while the Game/SP services do not.

<--- Last few GCs --->
--
[27:0xffff9436c3c0] 13975962 ms: Mark-sweep 246.6 (258.9) -> 245.5 (258.9) MB, 222.2 / 0.0 ms  (average mu = 0.984, current mu = 0.129) allocation failure scavenge might not succeed
[27:0xffff9436c3c0] 13976247 ms: Mark-sweep 246.0 (258.9) -> 245.5 (259.1) MB, 258.3 / 0.0 ms  (average mu = 0.965, current mu = 0.093) allocation failure GC in old space requested
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
error Command failed with signal "SIGABRT".

Running git diff -w --stat 1.97.2..1.97.3 server doesn't show all that many changes (about 100 lines added and removed). However, since backend services also pull in app/sdk (and others), the culprit change may be elsewhere.

Memory usage for the last three days looks like this:

Screen Shot 2022-10-19 at 12 31 17 AM

The reduction towards the end was caused by increasing the available memory (thereby reducing the utilization).

This lines up with the work on replays here: https://github.com/open-duelyst/duelyst/pull/163 https://github.com/open-duelyst/duelyst/pull/164

Before this, the last deployment was on 10/16 at ~10AM UTC, so these PRs could also be involved: https://github.com/open-duelyst/duelyst/pull/157 https://github.com/open-duelyst/duelyst/pull/158 https://github.com/open-duelyst/duelyst/pull/160 https://github.com/open-duelyst/duelyst/pull/161 https://github.com/open-duelyst/duelyst/pull/162

We can revert these one at a time (locally to rebuild a hotfix container and test) to see what changed.

willroberts commented 1 year ago

This may be helpful: https://github.com/airbnb/node-memwatch

willroberts commented 1 year ago

This appears to be the result of upgrading CoffeeScript in #162

willroberts commented 1 year ago

API memory usage by CoffeeScript version:

1.12.7: 446 MB 1.12.6: 410 MB 1.12.5: 433 MB 1.12.4: 448 MB 1.12.3: 459 MB 1.12.2: 284 MB 1.12.1: 292 MB 1.12.0: 295 MB 1.11.1: 293 MB 1.11.0: 288 MB 1.10.0: 246 MB 1.9.3: 240 MB (requires coffeeify upgrade) 1.9.0: 239 MB 1.8.0: 237 MB

Measured just after the 'REDIS client onReady' log event with 'docker stats'.