Closed hsanjuan closed 3 years ago
Same here:
ipfs version 0.4.3 Ubuntu 16.0.4 ( 4.4.0-47-generic ) go-lang 1.7
after about 10 days memory grows to about 15G despite only a few hundred files pinned. Issue is replicated across 10 servers. Restarting the daemon fixed it but continues to grow and needs to be restarted.
UPDATE: Ah, ha! I found the enable garbage collection flag in the documentation, so trying:
ipfs daemon --enable-gc
@jonnycrunch the --enable-gc
flag refers to disk gc, not memory gc.
The memory leakage is coming from somewhere else... Next time the memory gets out of hand can you get me the debug info described here: https://github.com/ipfs/go-ipfs/blob/master/docs/debug-guide.md#beginning
Particularly the heap profile, goroutine dump and ipfs binary
Hi! We are using this ipfs 0.4.4 at Linux 4.4.35-33.55.amzn1.x86_64 #1 SMP Tue Dec 6 20:30:04 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Currently it eats 65-76% of memory at 2GB instance, OOM sometimes kills it and it starts again and usage grows during several hours to given value. But looks like this is enough for the daemon to not be killed - may be it uses some smart way to determine how not to be killed :-) While experimenting with memory limits I saw that usage grows to whole available memory but not more (no swap used for IPFS, but other applications may have problems with available memory).
ipfs/QmaB2FJr1Z6yGRy9G37aXsBirR43Lc9ya3Q29R4gMYDVDv
- dumps. Shall I recreate my node after sharing these files? Do they have a chance to contain any private keys or other data? Node is disposable, and contains no private files yet, but may have in future.
Also I noted that after running disk gc (ipfs repo gc
) memory decreased from 70% to 65%, but after adding this debug directory it's again 75% of total host memory.
I have no idea how go works, so if you need more debug info or this one is unhelpful - please feel free to ask for more details.
Also, I have ipfs node run at 512MB Digitalocean instance, and it's managed by supervisord. OOM kills it there pretty fast (several hours), and supervisord starts it, and it dies again, and again, but generally works okay.
Carla Sella, from the Ubuntu community, reports that using the ipfs v0.4.4, her virtualbox vm starts to get slow after it connects to over 70 peers. Here are her debugging files. ipfs.tar.gz
Maybe it is time for Garbage Collection to be enabled by default? @whyrusleeping @RichardLitt @diasdavid
@jonnycrunch as @whyrusleeping said, the --enable-gc
flag is datastore garbage collection, not the program garbage collection.
The core problem is what we call "connection closing", IPFS is currently connecting with almost everyone which in connection with muxer implementation we are currently using takes a lot of memory. We are working on reducing it but it might take a while. The connection closing is much harder problem we initially expected.
The --enable-gc
flag shouldn't matter, it might reduce memory usage a bit, but it isn't the core problem as far as I know.
This is the debugging information I have collected from 1 node that was still running (2 have died):
https://ipfs.io/ipfs/QmXnYzZT1EAq9pzi6snd6KHD8kNrBSDuyJqLPe7QHzUE23
It was also using 150% CPU when I checked it and >80% MEM. They are still on 0.4.5-pre1
though.
stack dump from #61 , this is a vps with CentOS 7 64 with 1Gb memory, ipfs daemon crashed in 5 days after start: ipfs-crash-May-07-grep-ipfs-var-log-messages.zip
ipfs package go-ipfs_v0.4.8_linux-amd64.tar.gz
Hey everyone, ipfs 0.4.11 should have some significant improvements here. The issue is not entirely resolved, but the leak should be mitigated.
Still leaking memory in 0.4.13 — killed after ~12 hours.
At the moment, the largest issue is the peerstore. We had a rather nasty bug that will be fixed in the next release (we, uh, kind of didn't forget any address of any peer to which we had ever connected and, worse, advertised these (sometimes ephemeral) addresses to the network..).
@Stebalien
that will be fixed in the next release
Does that mean that the fix is already in master
or is work in progress?
Fixed in a dep. PR pending: #4610
On January 28, 2018 2:29:49 AM PST, "ᴠɪᴄᴛᴏʀ ʙᴊᴇʟᴋʜᴏʟᴍ" notifications@github.com wrote:
@Stebalien
that will be fixed in the next release
Does that mean that the fix is already in
master
or is work in progress?
I profiled it and it seems like a lot of the CPU waste is surprisingly in AddAddrs in the AddrManager. Reading that code, it seems very hasty and not performance minded. I'll PR something to go-libp2p-peerstore to optimize that with concurrent maps, which should help.
I'll PR something to go-libp2p-peerstore to optimize that with concurrent maps, which should help.
Unfortunately, the issue is https://github.com/libp2p/go-libp2p-peerstore/issues/26 and the fact that the number of multiaddrs assigned to a peer can grow unchecked*. The peerstore actually works fine with a sane number of addresses.
*The previous version of go-ipfs failed to forget observed multiaddrs for peers and, worse, would gossip these observed multiaddrs. That combined with NATs and ephemeral ports lead to a build up of addresses for some peer.
The solution to this is really to sign peer address records (should be doing this anyways), enforce a maximum number of addresses, and require that there only be one valid peer address record per peer.
Yeah, but that code is still unoptimized and in general really rough, even for a small number of addresses. Agreed that there is a bigger reason though as you describe.
Still leaking memory in 0.4.18, between 0-100kB/sec (averaging at a rate of somewhere around 10kB/sec).
@maznu are you sure its leaking memory? go is a garbage collected language, which means memory usage will appear to increase until a GC event. after a GC event, memory doesnt necessarily get released back to the OS, but internally the previously allocated memory will get used.
How are you measuring this?
Still leaking memory in 0.4.18, between 0-100kB/sec (averaging at a rate of somewhere around 10kB/sec).
https://golangcode.com/print-the-current-memory-usage/
Using this periodically, you can gather memory usages of several days. With a graph tool like Microsoft Excel, you can check tendency of memory usages.
Several days? It's eating up all the RAM on a 1Gb VPS (and then being killed by the kernel oom) within eight hours.
You can see there that there is garbage collection and freeing back to the OS — plenty of green spikes within that orange lump of usage — but fundamentally it just continues to grow.
Can someone with bad memory usage please grab a memory trace?
Can someone with bad memory usage please grab a memory trace?
I am experiencing this issue using go-ipfs 0.4.19
:
https://ipfs.io/ipfs/QmSkYDJV1BJeLm2uEBqnshcmBRb1LMPPPxdBsUrGDNGv8J
For me it takes ~2 days for the daemon to exhaust 1GB of memory and get OOM killed.
@alexkursell I'm only seeing ~30MiB of memory usage on the heap. Unfortunately, I can't seem to download the goroutine stack traces.
When you grabbed that memory dump, how much memory was go-ipfs using (at that point in time).
The biggest problem i'm seeing with memory usage lately isnt that ipfs always uses a lot of memory, its that it randomly spikes to a lot of memory, and go will pretty much never release that memory.
To debug this further, I would put a memory limit on the ipfs process (say, 1GB) so that it panics when the memory spikes, and we can then figure out what the problem is.
@Stebalien. I've grabbed a new set of diagnostics, along with the output of top
: https://ipfs.io/ipfs/QmVB4s9Eu1XYxbikuzQix6SGUoDtqS46oyPJFanWLRMwV5 At the time this was taken, it looks like the daemon was using around 750mb.
I was able to run an ipfs node just fine for a while but it's started taxing my server so much it's impossible to continue using. It would be fine even if it used a gigabyte, but it continues eating more and more memory until the server simply crashes.
@alexkursell
Go is "only" using about 300MiB of heap memory so it looks like memory usage spiked at some point and go never returned the memory.
The largest actual memory users appear to be:
+1. I just set up a node on an Ubuntu 19.04 vps, and it died after about a day. I'll try the latest master and see if that fixes it.
@kaysond (and others) when your nodes die due to running out of memory, can you please send us the stack traces? It will help us track down whats causing the memory spikes.
I built from the latest source, and it seems to have grown steadily then leveled off at around 600MB overnight.
@whyrusleeping after a few days it looks like it settled out at a solid 1GB RAM. I've attached all the dumps per the debug guide memdebug.tar.gz
@kaysond
It looks like that memory is:
ipfs config --json "Swarm.DisableBandwidthMetrics true"
.@Stebalien thanks. I'll add that to my config and see how much it helps. Is there a plan to implement said "forgetting"?
@kaysond not yet but it looks like we'll have to do that at some point. I've never seen that show up in a heap trace. You must have connected to ~0.5M (estimated) unique peers over the course of a few days.
I've filed an issue (https://github.com/libp2p/go-libp2p-metrics/issues/17) but it's unlikely to be a priority given that most systems connecting to that many peers have quite a bit of memory (unless that was entirely DHT traffic...).
That brings up a good point. If you're memory constrained, try running the daemon with --routing=dhtclient
.
I set up a node mainly to serve a single website from ipfs, so the less memory it uses the cheaper my VPS can be.
I'm skeptical that the site draws that much traffic... so I guess its just the nature of being connected to the swarm? The node isn't exactly a public gateway, so I'm not sure what caused all of the connections.
I'll try it with that option and see what happens.
The node isn't exactly a public gateway, so I'm not sure what caused all of the connections.
Probably the DHT.
Any updates on this?
Btw, the command to disable bandwith metrics didn't work anymore, the new one is ipfs config --bool Swarm.DisableBandwidthMetrics true
Is it even needed, anymore?
With the command /usr/local/bin/ipfs daemon --enable-gc --routing=dhtclient
, after several weeks my node has settled at around 500MB RAM
@kaysond Used that command. This + Swarm.DisablebandwidthMetrics
works, thx
The remaining issue is https://github.com/ipfs/go-ipfs/issues/2848. Closing this one as it's quite old.
Version information:
go-ipfs version: 0.4.5-dev- Repo version: 4 System version: arm/linux Golang version: go1.7
Type: Problem
Priority: P4
Description:
I have some Raspberry Pis 3 running go-ipfs daemon. Right now they don't do anything. The Pis don't handle any IPFS requests or anything. They are just there running the daemons. After about 10 days ipfs is getting killed in all of them because they are taking too much memory.
The daemons are killed around RSS=783192 My longest running daemon (11 days) has RSS=605868 A newly started daemon has RSS=92020 A one day running daemon has RSS= 542408
Questions:
Related: #3318 and the question about running IPFS on platforms with limited resources.