zombocom / puma_worker_killer

Automatically restart Puma cluster workers based on max RAM available
748 stars 77 forks source link

Add support for Heroku log-runtime-metrics #54

Closed brian-kephart closed 4 years ago

brian-kephart commented 6 years ago

This PR uses code from the whacamole gem, modified to fit PWK. If you provide a Heroku API token in your config, then instead of attempting to measure total memory use directly PWK will get it from your Heroku logs with an HTTP request. This PR doesn't change existing operation until a Heroku API token is added to PumaWorkerKiller.config.

If the HTTP request does not return any memory info, the get_total method will return 0.0 instead of raising an exception, so nothing blows up. Generally, the next request will return valid memory information.

Note that this PR requires log-runtime-metrics and the DYNO environment variable, both of which are Heroku beta features. Again, though, nothing blows up if they stop working. Worst-case scenario is if Heroku removes these features and your app still has a Heroku API token in PumaWorkerKiller.config, the app will continue to run but PWK will not work.

There are no new tests, because I don't know how to test these features. Existing tests pass locally.

brian-kephart commented 6 years ago

Update: added a test and fixed a bug.

brian-kephart commented 5 years ago

Update: I've been running this in production for the last couple years, and have continued to update it as needed. It's still in good shape to merge if anyone else thinks it's a good idea.

schneems commented 4 years ago

Keeping the dream alive. I'm looking at this right now. Thanks for all of your work so far!

schneems commented 4 years ago

Okay, having looked at this indepth. I don't think I want this in PWK as-is. That's the bad news.

I have some good news, so stay tuned.

I don't like that we're having to stream and process logs to get the information that we want. It seems extremely CPU heavy and more likely to break the larger your application is. That's why I'm not super thrilled about mergging this in.

The good news though is there is a private API you can tap into to get this information directly. Here's what I think would be a good course of action.

You: Make a gem that is a really loose wrapper around this API endpoint, call it something like heroku_memory_metrics or something. Please test this with webmock.

Me: Tell you about the API right now:

You can hit it like this:

curl -v -H "Authorization: bearer $(heroku auth:token)" "https://api.metrics.heroku.com/apps/<app-name>/formation/web/metrics/memory?step=1m&start_time=2019-11-05T16%3A51%3A00.000Z&end_time=2019-11-05T18%3A31%3A00.000Z&process_type=web"

Replace your <app-name> with the actual name of the app. The start time and end time are formatted like this %Y-%m-%dT%H:%M:00.000Z`. Then the step duration is provided in minutes (i.e. if you say you want to get the last 7 days of data, then you will lose resolution).

The output is json

# ....
"memory.used.bytes.mean":[779493921.3333334,778211163.6666666,782405467.6666666,781353396.3333334,781283491.5,776302755.3333334,773401695,773122075,776054592.3333334,774373375.6666666,771154247,771935436,773551990.6666666,776956367.3333334,776145469,778655060.6666666,776257317,778253106.6666666,776449556,777658913.6666666,779958790.3333334,780018209.6666666,779832961.3333334,782926260.3333334,779556836.3333334,778745937,782020990,781440778,780434144.6666666,785519738.3333334,786341123,781605055,784198532.5,782440420.3333334,774523671.6666666,775911287,774764844,771168228.3333334,770539082.6666666,773115084,773132560.3333334,771989612.6666666,775282141.3333334,777658913.6666666,780500554.6666666,776652280.6666666,782950727,783436567.3333334,783869978.6666666,784453686.3333334,787637862,790493484,791842651.6666666,790778347,786288694.3333334,780371230.3333334,780434145,789252669,786183836.3333334,783800074,783674244.3333334,782146819,784226494.6666666,786634724.3333334,785079336.3333334,786687153.3333334,786376075.6666666,494720777.75,544647850,548060964.8333334,665335452.3333334,703804210.75,640382839,661647960.3333334,670968053.5,682776766.6666666,691147898.3333334,699931470,703542067,710315867.6666666,715132326.6666666,717372784.3333334,719774023.5,719064487,722189243.3333334,726862397,731741770.6666666,738386247.3333334,743045419.6666666,743104839,744261768,742594532,741832567,743723499,745761231.6666666,747134866.3333334,747613716,752562994.6666666,749710868.3333334,750102336.6666666,750429142.5]}}

That can then be parsed.

It's not documented (hence the private) but I don't think it's likely to go away anytime soon.

You: After you make the gem, refactor this PR to use the gem instead of streaming and processing the logs.

Also FWIW I added a giant warning to the README. In general I don't think people should be using this gem for any length of time.

brian-kephart commented 4 years ago

Thanks for the review, and for the info. It will probably be awhile before I can get to these changes, but I'll submit when I can.

schneems commented 4 years ago

If you can't get to it, that's fine too. I just didn't want to leave you hanging. Thanks for the PR, this was a lot of code. Sorry for no comment for so long.