high memory usage under heavy load

btkador commented 6 years ago

was running about 5 hours with that settings ... filling in 1000 searches/hashes in parallel and immediately refill it...

I've used a slightly modified dht.go because I had disabled the clientThrottle feature for debugging reason. so use that one regarding line numbers.

                      s, err = dht.New(&dht.Config{
                                Address:                 "",
                                Port:                    sPort,
                                NumTargetPeers:          5000,
                                DHTRouters:              "router.magnets.im:6881,router.bittorrent.com:6881,dht.transmissionbt.com:6881,router.utorrent.com:6881,dht.aelitis.com:6881,dht.libtorrent.org:25401",
                                MaxNodes:                500,
                               CleanupPeriod:           3 * time.Minute,
                                SaveRoutingTable:        false,
                                RateLimit:               -1,
                                MaxInfoHashes:           2,
                                MaxInfoHashPeers:        2,
                                ClientPerMinuteLimit:    50,
                                ThrottlerTrackedClients: 10,
                                UDPProto:                "udp4",

heap2

more detailed

heap2_detail

it seems 2.5gig of the 8 get’s eaten by dht.go:610 which is node.pastQueries[r.T] = query

also in krpc.go:189 the jackpal unmarshal eats another 1.123gig

CPU looks fine to me, just the unmarshal eats a plenty.. after I got (and still see) many of dht.go:568] DHT: readResponse Error: unexpected EOF, I was trying to replace it with https://github.com/IncSW/go-bencode (with is much less cpu intensive regarding the benchmarks there) to see if it solves the issue, but due to my very limited golang skills I was unable to find the right syntax parameters and value type conversations to make it work.

cpu

time

expvar output

btkador commented 6 years ago

Why do you have a CleanupPeriod of 3 minutes? That's a high frequency. With those settings, you have to contact your 500 nodes every 300 seconds just to keep them alive in your routing table. That's probably too much.

cleanup is that low, since I was playing with the variables to find out what they do. I was hoping the cleanup kicks out old nodes faster and therefore the pastQueries with them.

do we need all the data in node.pastQueries[r.T] = query or can we limit it to a maximum count somehow?

Setting a RateLimit means you reply less often to incoming packets, which reduces output rate and thus consumes less memory. You probably don't have an infinite network connection anyway :), so if you send too many packets you're probably causing congestion and reducing your overall "useful" throughput.

In’t RateLimit also dropping reply packets to my get_peers? The anacrolix/dht as a “Passiv bool” to disable responding to incoming queries completely… If ratelimit is only for incoming queries, setting it to 1 or 0 is probable the same effect?

I'm running another profiling right now with ratelimit 10000 and Cleanup back to 15 Minutes default.

nictuku commented 6 years ago

Like we discussed, some of the memory usage you're seeing is normal given that GOGC=100, meaning that by default Go uses twice the amount of heap memory that it truly needs.

It looks like pastQueries does not have a limit so it's growing unbounded. It should probably be limited. Nice finding!

A few alternatives:

1) use a ringer buffer or an LRU. 2) add a timer in the main loop (or a step in the existing cleanup routine) that removes transactions older than a certain time. This would require tracking the age of each transaction.

It's been a while since I looked at this code so I could be wrong but I think the best approach here is to use a simple ring buffer that recycles the oldest event. See https://golang.org/pkg/container/ring/

It is possible that once we fix pastQueries, the allocations we see inside readResponse might get fixed, too, because pastQueries is causing the "r" to be retained in memory.

Want to submit a PR for doing the ring buffer for pastQueries? 😄

btkador commented 6 years ago

I've already replaced pastQueries with the lru from peerStore for a test beginning of last week, but discarded it for some reason. Running it again with the profiling enabled right now.

btkador commented 6 years ago

had it running with max 1000 pastQueries and pendingQueries but went up to 20GB used memory in profiler...

had to run it again because killed my profiler, so screenshot/data below is after about 20 minutes run..

heap3

if I substract totalNodes vs KilledNodes there are still 1.434.393 active nodes, which is quote a lot ... maybe limit the number of nodes, too?

"totalDroppedPackets": 0, "totalFindNodeDupes": 491, "totalGetPeersDupes": 59304708, "totalKilledNodes": 291915, "totalNodes": 3941162, "totalNodesReached": 2587206, "totalPacketsFromBlockedHosts": 4341, "totalPeers": 13924652, "totalReadBytes": 2739891552, "totalRecv": 10306303, "totalRecvFindNode": 106502, "totalRecvFindNodeReply": 658, "totalRecvGetPeers": 119602, "totalRecvGetPeersReply": 8042017, "totalRecvPingReply": 1903051, "totalSelfPromotions": 1943, "totalSent": 16217619, "totalSentFindNode": 1031, "totalSentGetPeers": 13092528, "totalSentPing": 2834225, "totalWrittenBytes": 1458361886

nictuku commented 6 years ago

totalNodes - totalKilled nodes isn't a reliable measure of your active nodes. In theory at least, routingTable.numNodes is a better measure. Note that supposedly the number of nodes in the routing table is already limited by MaxNodes.

In the code you shared with me, the LRU has 1000 elements for pastQueries. That's 1000 for each Node, and you have 500 MaxNodes. Assuming each pastQueries entry consumes around 100b, that still wouldn't explain 10GB of memory. I think you should make those LRUs significantly smaller (like 10 elements), or make it a global LRU with a merged key of infohash+node ID.

But there's something else missing.

Are you running multiple DHT nodes in a single process maybe?

On Sun, Aug 5, 2018 at 8:42 PM Michael notifications@github.com wrote:

had it running with max 1000 pastQueries and pendingQueries but went up to 20GB used memory in profiler...

had to run it again because killed my profiler, so screenshot/data below is after about 20 minutes run..

[image: heap3] https://user-images.githubusercontent.com/9356285/43695684-59d131c0-9964-11e8-99eb-ecf048339bf1.jpg

if I substract totalNodes vs KilledNodes there are still 1.434.393 active nodes, which is quote a lot ... maybe limit the number of nodes, too?

"totalDroppedPackets": 0, "totalFindNodeDupes": 491, "totalGetPeersDupes": 59304708, "totalKilledNodes": 291915, "totalNodes": 3941162, "totalNodesReached": 2587206, "totalPacketsFromBlockedHosts": 4341, "totalPeers": 13924652, "totalReadBytes": 2739891552, "totalRecv": 10306303, "totalRecvFindNode": 106502, "totalRecvFindNodeReply": 658, "totalRecvGetPeers": 119602, "totalRecvGetPeersReply": 8042017, "totalRecvPingReply": 1903051, "totalSelfPromotions": 1943, "totalSent": 16217619, "totalSentFindNode": 1031, "totalSentGetPeers": 13092528, "totalSentPing": 2834225, "totalWrittenBytes": 1458361886

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nictuku/dht/issues/56#issuecomment-410580500, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMY9vXNrxuDo0lzIR7RVOap7__2E38vks5uN7scgaJpZM4VvNzB .

btkador commented 6 years ago

for testing output I've just implemented

func (d *DHT) GetNumNodes() int {
        return d.routingTable.numNodes()
}

and the number grows way higher than MaxNodes. It's at 500.000 after 5 Minutes. MaxNodes seems only to take effect at bootstrap and we receive a ping from external, to decide if we "learn" the new node. But when sending out get_peers it stores each learned node always via getOrCreateNode .. which we can see in the profiler, too. Buf we have to learn all of them to be able to process the get_peers response, so there can't be any hard limit.

But if I search 100.000 different hashes within one hour and from each search we learn 1000 new nodes (because the fact that get_peers always returns new nearest nodes) that's already 100 million nodes plus their pastQueries.

I've now added a createTime to each new node, and expire them after CleanupPeriod*2 if unused.

                // kill old and currently unused nodes
                if time.Since(n.createTime) > cleanupPeriod*2 && n.pendingQueries.Len() == 0 {
                        log.V(4).Infof("DHT: Old node with 0 pendingQueries. Deleting")
                        r.kill(n, p)
                        continue
                }

and numNodes stays now around 1 million and heap around 4 gigabytes.

I will try another version next were I keep your original code without all the lru-mod and only the code above.

heap5

nictuku / dht

high memory usage under heavy load #56