Closed andersy005 closed 1 year ago
This could also be because of memory use when running the recipe files?
In addition, I'd suggest increasing the dyno size too on heroku!
This could also be because of memory use when running the recipe files?
i hadn't thought about this. I presume by "recipe runs" you mean when we execute the recipe modules via pangeo-forge-runner expand-meta ...
for registration, right?
In addition, I'd suggest increasing the dyno size too on heroku!
Do you have any recommendation? We are currently using hobby
dyno, and It seems the next dyno type standard-1x
isn't that different (memory wise).
In addition, I'd suggest increasing the dyno size too on heroku!
As a first pass, i'm going to enable log-runtime-metrics to track load and memory using four our current dyno: https://devcenter.heroku.com/articles/log-runtime-metrics.
@andersy005 measuring seems right next step!
@yuvipanda, something is going on during the pangeo-forge-runner expand-meta ....
call
Here's the memory profile after a reboot
2022-11-02T18:30:10.845831+00:00 heroku[web.1]: source=web.1 dyno=heroku.247104119.54df4cd5-f10c-4baa-b412-32d8fa56c24d sample#memory_total=149.32MB sample#memory_rss=148.88MB sample#memory_cache=0.45MB sample#memory_swap=0.00MB sample#memory_pgpgin=69641pages sample#memory_pgpgout=31414pages sample#memory_quota=512.00MB
I then launch a test run for this recipe: https://github.com/pangeo-forge/staged-recipes/pull/215
After calling pangeo-forge-runner expand-meta ...
, i started noticing memory spikes
2022-11-02T18:32:02.363030+00:00 app[web.1]: 2022-11-02 18:32:02,362 DEBUG - orchestrator - Running command: ['pangeo-forge-runner', 'bake', '--repo=https://github.com/norlandrhagen/staged-recipes', '--ref=8308f82cbdede7d8039a72e4137e5d16c800eb89', '--json', '--prune', '--Bake.recipe_id=NWM', '-f=/tmp/tmp985ps8od.json', '--feedstock-subdir=recipes/NWM']
2022-11-02T18:32:14.054996+00:00 heroku[web.1]: source=web.1 dyno=heroku.247104119.54df4cd5-f10c-4baa-b412-32d8fa56c24d sample#load_avg_1m=0.63
2022-11-02T18:32:14.188714+00:00 heroku[web.1]: source=web.1 dyno=heroku.247104119.54df4cd5-f10c-4baa-b412-32d8fa56c24d sample#memory_total=329.25MB sample#memory_rss=326.84MB sample#memory_cache=2.41MB sample#memory_swap=0.00MB sample#memory_pgpgin=122482pages sample#memory_pgpgout=38195pages sample#memory_quota=512.00MB
notice how the memory increased from 149MB
to 326MB
. The memory eventually blew up, and heroku restarted the workers
2022-11-02T18:34:53.563144+00:00 heroku[web.1]: source=web.1 dyno=heroku.247104119.54df4cd5-f10c-4baa-b412-32d8fa56c24d sample#memory_total=826.02MB sample#memory_rss=511.88MB sample#memory_cache=0.00MB sample#memory_swap=314.14MB sample#memory_pgpgin=255319pages sample#memory_pgpgout=124278pages sample#memory_quota=512.00MB
2022-11-02T18:34:53.720844+00:00 heroku[web.1]: Process running mem=826M(161.3%)
2022-11-02T18:34:53.926451+00:00 heroku[web.1]: Error R14 (Memory quota exceeded)
2022-11-02T18:34:54.931260+00:00 app[web.1]: [2022-11-02 18:34:54 +0000] [57] [CRITICAL] WORKER TIMEOUT (pid:58)
2022-11-02T18:34:54.964405+00:00 app[web.1]: [2022-11-02 18:34:54 +0000] [57] [WARNING] Worker with pid 58 was terminated due to signal 6
2022-11-02T18:34:55.311602+00:00 app[web.1]: [2022-11-02 18:34:55 +0000] [122] [INFO] Booting worker with pid: 122
2022-11-02T18:34:57.219544+00:00 app[web.1]: [2022-11-02 18:34:57 +0000] [122] [INFO] Started server process [122]
2022-11-02T18:34:57.219620+00:00 app[web.1]: [2022-11-02 18:34:57 +0000] [122] [INFO] Waiting for application startup.
2022-11-02T18:34:57.220136+00:00 app[web.1]: [2022-11-02 18:34:57 +0000] [122] [INFO] Application startup complete.
My suspicion is that the expansion of the meta-information of pangeo-forge runner is the cause of this spike. Not sure if the s3 crawling in https://github.com/pangeo-forge/staged-recipes/pull/215 could also be another reason this recipe in particular is running into this memory issues.
This is an attempt at addressing recent memory issues