Closed olavurdj closed 1 year ago
Thanks for reporting - does this happen after calling a specific endpoint or does it reach the OOM simply after running for a while?
This happens after running medusa deploy
after an unspecified amount of time (3-5h). My case is on a Digital Ocean droplet with 2GB of RAM.
<--- Last few GCs --->
[58606:0x4e5d040] 16056726 ms: Mark-sweep 1964.2 (2000.5) -> 1963.5 (2000.2) MB, 4683.1 / 0.2 ms (average mu = 0.780, current mu = 0.391) allocation failure scavenge might not succeed
[58606:0x4e5d040] 16064831 ms: Mark-sweep 1964.6 (2000.2) -> 1963.8 (2000.7) MB, 5098.1 / 0.1 ms (average mu = 0.660, current mu = 0.371) allocation failure scavenge might not succeed
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory 1: 0xb00e10 node::Abort() [node]
2: 0xa1823b node::FatalError(char const*, char const*) [node]
3: 0xcee09e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
4: 0xcee417 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
5: 0xea65d5 [node]
6: 0xeb5cad v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
7: 0xeb89ae v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
8: 0xe79d57 v8::internal::Factory::New(v8::internal::Handle<v8::internal::Map>, v8::internal::AllocationType) [node]
9: 0xe89c91 v8::internal::Factory::NewProperSubString(v8::internal::Handle<v8::internal::String>, int, int) [node]
10: 0x10ead06 v8::internal::SharedFunctionInfo::GetSourceCodeHarmony(v8::internal::Handle<v8::internal::SharedFunctionInfo>) [node]
11: 0x1063880 v8::internal::JSFunction::ToString(v8::internal::Handle<v8::internal::JSFunction>) [node]
12: 0xd682aa v8::internal::Builtin_FunctionPrototypeToString(int, unsigned long*, v8::internal::Isolate*) [node]
13: 0x15e7dd9 [node]
@srindom I haven't observed any specific API calls that trigger this behaviour, it seems to be mostly work-over-time related. I'll try to see if I can narrow down it down. Do you have any overview of Node versions used in production setups?
@srindom I haven't observed any specific API calls that trigger this behaviour, it seems to be mostly work-over-time related. I'll try to see if I can narrow down it down. Do you have any overview of Node versions used in production setups?
I feel I can second this opinion. I've noticed the same behaviour locally , when there is any request for like an hours or more.
Most of the production setups we have been involved in run on Node 14 or 16.
Do you know what the max-old-space-size is set to? I believe that Node defaults to 512MB in some circumstances which might not be enough for Medusa to run in a production setting (although I don't have any hard stats to back this up).
As per the Node docs it is recommended to set this to ~1.5GB for a machine with 2GB RAM: https://nodejs.org/api/cli.html#--max-old-space-sizesize-in-megabytes
How would we do this for Medusa?
@srindom I'm running it with a max-old-space-size of ~4GB.
@rhijjawi You can set the NODE_OPTIONS
environment variable, either globally or when running the server.
A simple approach is to just edit the script in your package.json, e.g.
"scripts": {
"serve": "cross-env NODE_OPTIONS=--max-old-space-size=4096 medusa start",
...
}
Note: Make sure to add cross-env
to your dev-dependencies.
@olavurdj
I have 100% the exact same error as you . When i start medusa memory will start at around 380mb something and i can see increase bit by bit even if i do nothing until it crash
this is my medusa api core before crashing 3.29g. When i launch its just 380mb .I'm on localhost , doing nothing, chilling.not even launch admin or storefront....
After lot testing , i find out is because as soon as you start to have a lot of data(not sure how much) in your medusa database , Customer-Product-Price etc you must absolutely MUST install/use 'REDIS' <---not optional , i didn't because i was on localhost and just started to play with medusa and while i had few/no data i didnt have any errors.
Maybe i should raise a ticket but my problem is reapetable i test it with 2 friend same memory leak error.
@AlexDigital974 this is super insightful - my immediate thought is that it has something to do with fakeredis
not being configured correctly.
Will try to play around with this - thank you!
Quick thing to add as well. I believe redis get pinged most of the time as I ran out of 10k requests from upstash in less than 30-45mins, thats also on my development machine.
@AlexDigital974 Thank you! I am not running redis, so this is very likely the same issue I'm experiencing. I'll see if installing redis on my setup helps.
@olavurdj great! it is likely the same issue as i have , if you do not have redis and you db is not empty this is it.
If it resolved you issue let us know i think as more people use medusa , having DB with some data & not Redis install is a very likely scenario that a lot of people will meet
I can confirm that fakeredis is enabled on my instance of medusa.
On Wed, Jun 22, 2022, 05:38 Sebastian Rindom @.***> wrote:
@AlexDigital974 https://github.com/AlexDigital974 this is super insightful - my immediate thought is that it has something to do with fakeredis not being configured correctly.
Will try to play around with this - thank you!
— Reply to this email directly, view it on GitHub https://github.com/medusajs/medusa/issues/1651#issuecomment-1162878159, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENVYPDB7WBH5MOZK5CHS5TVQLNKRANCNFSM5YTUETCA . You are receiving this because you were mentioned.Message ID: @.***>
I just recently disconnected a redis instance from our medusajs server in order to save hosting expenses, as we currently do not depend on any of the features redis enable. This caused OOM crashes for us as well.
Would be nice to have this resolved, as it can be quite cost saving, especially in a staging/preview environment.
Is there a solution for this yet, other than "install redis"?
fake-redis is RAM based, right? Why not install redis and not use it?
@haveamission just install redis..... wherever its you server localhost ,premise, or cloud just install redis .its FREE. and i kid you not, you can have it up in like a minute...rather that waiting this fixed which even its fixed , its not supposed to replace a proper redis.
just use docker you dont even need to pay or even make a setup....
put below into a docker file , run via docker ->problem solved... and you can reuse that docker file to you staging-testing-developer team, easy.
version: "3.8"
services:
redis-server:
image: "redis:alpine"
command: redis-server
ports:
- "6379:6379"
volumes:
- redis-data:/data
volumes:
redis-data:
driver: local
@AlexDigital974 running Redis locally is no problem.
However, if you want to deploy PR builds in your review process, a Redis database on Digital Ocean would add 15$ to your bill. Being able to run a fully functional MedusaJS server on low resources makes automated testing and reviewing easier.
I'm not saying its a deal breaker, but if this could be prevented, it is much appreciated 🙌
@magnusdr you make a fair point, having a fix for 'fakeredis' would be optimal.
Does anyone know if there's a way to restart the 'fakeredis' instance via shell? If so, could we cron it ever x hours?
Also, a redis instance on the same machine as the Medusa instance is negligible in size, right? We don't NEED a whole other VM.
We will soon deprecate the fake Redis module and default to an event bus with no effect when events are emitted and subscribed to. It will just let the events pass through, such that you can deploy without Redis (even though that's not recommended) or boot up preview environments as mentioned by @magnusdr.
You can find the WIP implementation in this PR.
Right now, we use ioredis-mock
(link) to power the fake Redis, so you might be able to find a command in their documentation to restart or flush the instance.
Hope this helps.
I will close this issue as we have identified the culprit and are to introduce a fix soon.
I'm encountering the opposite issue. I have 102 products, 7960 variants, 816 product options, and 63680 product option values, and money amounts to support all the possible destinations and currencies (~135 per variant).
If I have redis enabled, about 2 minutes after running npm run dev
, with no requests, memory starts to go up until it hits the OOM error. If I disable redis, it stays consistently at 296MB.
Edit: it eventually fails with redis disabled too.
@chrisp-code What version of @medusajs/medusa
do you have installed?
If you are using v1.7.10
or above, you might face an issue with Bull (our message queue powered by Redis), that was uncovered today.
If this is the case, give this version of the core a spin: yarn add @medusajs/medusa@1.7.12-snapshot-20230306155104
Should resolve your issues. Let me know how it goes :)
@chrisp-code What version of
@medusajs/medusa
do you have installed?
Was on 1.7.8
and then upgrade to latest (1.7.11
) and both presented the behavior.
Truncated my DB so I could start over. Trying to make some modifications so I can reduce the number of variants and options.
Got it, would love if you could keep me posted :)
Was able to reduce the number of variants by 60% (this is still only 2% of my current number of products so this is not ideal) but still get OOM error after restarting the service (it was stable before restarting, but had 7,810 "bull:EventBusService:queue:job_{id}"
keys in redis with 1.7.11
will try snapshot next).
Both 1.7.11
and 1.7.12-snapshot-20230306155104
have the same behavior of OOM after ~1-2 minutes of idle. No immediate issue with redis disabled but still eventually OOMs.
I ran redis-cli flushall
and was looking at the output of redis-cli keys *
. I don't know if this means anything to you, but I did notice this difference in output.
Before OOM (after server start and at idle):
1) "bull:EventBusService:queue:stalled-check"
2) "bull:EventBusService:queue:1:lock"
3) "bull:EventBusService:queue:id"
4) "bull:EventBusService:queue:1"
5) "bull:scheduled-jobs:queue:stalled-check"
6) "bull:EventBusService:queue:active"
After (line 1 is sometimes present or not):
1) "bull:EventBusService:queue:1:lock"
2) "bull:EventBusService:queue:id"
3) "bull:EventBusService:queue:1"
4) "bull:EventBusService:queue:active"
Truncated my DB again and re-ran the product loading with 1.7.12-snapshot-20230306155104
, no change in behavior. Still get OOM after ~1 min at idle after restart.
Didn't see the 7K "bull:EventBusService:queue:job_{id}"
keys with this version though.
redis-cli keys *
before restarting:
1) "bull:EventBusService:queue:id"
2) "bull:EventBusService:queue:3322"
3) "sess:f-ioRYmWH62ECcPoZBROw39315jgCWGw"
4) "bull:EventBusService:queue:active"
During memory increase:
1) "sess:f-ioRYmWH62ECcPoZBROw39315jgCWGw"
2) "bull:scheduled-jobs:queue:stalled-check"
3) "bull:EventBusService:queue:stalled"
4) "bull:EventBusService:queue:active"
5) "bull:EventBusService:queue:stalled-check"
6) "bull:EventBusService:queue:id"
7) "bull:EventBusService:queue:3322"
8) "bull:EventBusService:queue:3323:lock"
9) "bull:EventBusService:queue:3323"
After crash:
1) "bull:EventBusService:queue:wait"
2) "sess:f-ioRYmWH62ECcPoZBROw39315jgCWGw"
3) "bull:EventBusService:queue:active"
4) "bull:EventBusService:queue:id"
5) "bull:EventBusService:queue:3322"
6) "bull:EventBusService:queue:3323"
Cleared DB, removed medusa-plugin-meilisearch
, and reloaded products again - the issue did not happen again.
Seems like loading a bunch of products & variants makes meilisearch fall behind on processing or not clearing old subscriber events?
For anyone experiencing the issue I listed above, I found out that reducing the batch_size
from the default (1000) to something lower fixed the issue with meilisearch (you can set it on the options
object when configuring the plugin).
This only appears to happen if you have products with a lot of variants, options, and prices. I ended up setting batch_size
to 10 and it still hits a peak of 1.2 GB memory while processing.
Hi,
Context I was confronted to this error while trying to deploy backend along with the admin plugin. First I deployed with Digital Ocean App then, assuming Digital Ocean was the problem, I tried to deploy to Railway but again, facing exact same error code as OP.
Solution
{
resolve: "@medusajs/admin",
/** @type {import('@medusajs/admin').PluginOptions} */
options: {
autoRebuild: false, // Here is the fix
serve: true // I think this is the default, so it might be useless
},
},
The way I understand it is that with autoRebuild
set to true, the cloud providers were trying to launch and rebuild the admin at the same time, causing the memory to be saturated. But this is only an humble guess.
It seems to me that this piece of information should be sitting in all of medusaJS
"deploy" documentation sections. As this would very easily avoid frustrations and even quitting from some of your users who, like me, are less experimented.
Thank you for your project, I like medusa JS.
The Medusa Server instance crashes (SIGABRT), due to memory allication failure after a period of time (1-4 hours observed). This happens both when running with
medusa start
andmedusa develop
.Medusa version 1.3.1 with Postgres. Node v16.15.1 Mac OS