Open RubenKelevra opened 2 years ago
This is surprisingly painfull thing to implement because right now this is an option to the DHT factory.
We can't just call dht.StartServer
.
It's nothing an API change in go-libp2p-kad-dht
can't solve, but that more work than expected.
48 hours is too much, 3 hours is more realistic since 80% of IPFS nodes in my scan have uptime lower then 2h. They are probably Brave run nodes.
48 hours is too much, 3 hours is more realistic since 80% of IPFS nodes in my scan have uptime lower then 2h. They are probably Brave run nodes.
That's exactly the nodes we want to NOT be part of the DHT as servers. :)
This is surprisingly painfull thing to implement because right now this is an option to the DHT factory.
We can't just call
dht.StartServer
.It's nothing an API change in
go-libp2p-kad-dht
can't solve, but that more work than expected.
Yeah, I kinda expected this. However, that would be pretty beneficial. :)
IMO this seems worth investigating, but I wouldn't pick a number out of a hat here. The balance here is between having more server nodes, and having higher uptime server nodes. More server nodes means more nodes to share the load, better nodes means less churn and likely some network parameters can change to require less load from DHT clients.
cc @yiannisbot for any thoughts here
Hey @aschmahmann,
the idea with 48 hours was simply to avoid to have office/desktop computers which run for a night to start serving DHT. So a computer would at least run for two nights.
This would also avoid that computer which have a semi stable connection start serving. In Germany for example the Internet connections are terminated after 24 hours and you dial in again and get a different IP. If the connection quality overall is poor it sometimes just fails for a while. That's why this is usually moved to the night by the routers.
So if it recovers fast by updating the IP and reconnect it's fine IMHO, but if there's a longer outage it's just not worth that the computers take part in the DHT as servers.
While, yes, the load would go up slightly when there are less nodes serving, as they need to cover a larger range of the address space, the query time goes down, as there are far less failed dials.
Currently the assumption is that at least one of the DHT nodes provides the record for at least 12 hours (reprovide interval). If the nodes only start serving when they consider themselves healthy for 48 hours it would be save to increase the reprovide interval to a lot longer interval and thus reducing the overall traffic in the network.
Btw: I'm currently seeing very high numbers for failed dials on my nodes (tcp resets), just for pushing records to the DHT: 10-200 failed tcp connection attempts per second, while I already filter out all private IP-spaces in my config file.
changed IP is not a problem, DHT is organised by Peer Id. It will quickly switch to new node IP.
@hsn10 never stated that this is a problem 🤔
2022-07-01 conversation: the goal with this is to reduce the number of dead peers in the network.
Data sources for how many dead peers are in the network:
That helps establish how big of a problem this is.
We then would need data on what the ideal value to set this to. Until having data and analysis of this, maintainers aren't planning to do any work here because there are higher leverage things that can be done for the DHT performance.
I don't have a clear view of what this would entail in terms of implementation, but it's certainly not a bad idea to consider.
Our results show that we could go on and increase the reprovide interval (or reduce the provider record replication factor) even now. Provider records seem to stay alive in ~15 peers over a 24h period.
Quick clarification: why would Brave nodes show as servers? Do you mean the Brave nodes that run on peers with publicly accessible IPs?
As mentioned above by @aschmahmann, the tradeoff here is to balance the churn rate vs the network size. It is important to have a large number of nodes participating in the DHT for load balancing and decentralization.
I see two main reasons for which a high churn rate has a negative impact on IPFS:
The following graph by @cortze in the context of RFM17 shows that the first point is not a concern. On average, 15 out of the 20 original Provider Records are still provided after 24 hours (without even being republished). IPFS republishes the Provider Records every 12 hours, so the probability that a Provider Record disappears from the DHT given the current churn rate is negligible.
In RFM19, we discovered that the number of stale entries in the routing table is low (roughly 4%
for buckets 0
to 8
and 15%
for bucket with higher ID). So it is not a big pain at the moment, but it can certainly be improved!
kubo
the most widely spread IPFS implementation makes use of libp2p/go-libp2p-kad-dht
, whose Kademlia k-bucket replacement policy roughly corresponds to refreshing the routing table every 10 minutes: the node will try to contact all the peers in its routing table and will evict the peers that fail to respond in time. Thus, stale entries cannot stay longer than 10 minutes in a peer's routing table. A stale entry will live 5 minutes on average (assuming a random uniform distribution between the time the node goes offline and the routing table refresh).
The table below shows the ratio of the stale entry liveliness vs the total uptime of the node ( $\frac{5 min}{uptime+5 min}$ ) for all nodes scanned by the Nebula Crawler from February to May 2022.
It shows us that on average peers with an uptime <1 hour will spend ~$\frac{1}{3}$ of their time being a stale entry. Nodes with an uptime between 1 and 2 hours could have stale entries living for 5.93%
of their uptime on average (average uptime of this category: 1 hour and 19 minutes = 79.2 minutes
$\rightarrow \frac{5 min}{79.2 + 5 min}=5.93\%$).
Note that starting the DHT server mode after X hours will significantly change the data on the graph. For instance, if the server mode is turned on after 1h, the nodes with 1-2h uptime will have a much higher stale entry liveliness vs total uptime ratio, (roughly 4x
) as the average uptime of this category is 79.2 minutes, so the nodes will serve the DHT in server mode for 19.2 minutes on average.
The following graphs shows the ratio having an uptime greater than X
hours/days.
Turning on the DHT server mode after 48 hours would reduce the number of DHT server peers by 35x
and increase the individual load by 35x
, which is probably not desirable. Excluding only the nodes with an uptime <1 hour would double the load for the rest of the network, but remove a large proportion of the potential routing table stale entries. It will also cause one additional empty k-bucket on average (non full buckets will be shifted 1 ID lower $\rightarrow$ 1 full bucket would become non full, and the last bucket would become empty).
It would make sense to start the DHT Server mode after 1 to 3 hours, the individual load would be approximately 2x
to 3x
. The plot below shows how the stale entry liveliness vs total uptime would look like if the DHT server mode is started respectively immediately, 1 hour, 2 hours and 3 hours after a node gets online. The graph is shifted and time starts when nodes join the DHT (respectively after 0, 1, 2 and 3 hours), so the node categories are also shifted.
We can see that the ratio roughly stays the same if we change the time DHT server mode is turned on. The overall number of stale entries will be reduced as there are more peers with uptime <1h compared with peers with uptime between 1h and 2h. That is because there will be less peers in the network, thus the performance would be left unchanged. The ratio is expected to stay approximately constant as the conditional uptime distribution, displayed below, is almost linear.
I am a bit surprised by these results, but it looks like starting the DHT server mode after some time would hardly improve the routing efficiency, but it would significantly reduce the number of peers participating in the DHT and increase the individual load. I think we should either not change anything or we could try starting the DHT server mode after 2-3 hours, which should give a small improvement in terms of routing table stale entries, but the individual load would increase by 3x
.
What do you think?
I see people are setting DHT to client mode manually to reduce CPU and Bandwidth load on their servers.
I have several servers and only one is set to dhtserver in config and I will turn it off soon because CPU load related probably to constant crypto handshakes with incoming requests is too high.
For me more network bw used for protocol management itself while in DHT server is not a problem. Its 10x more than running in dht client mode, but its still very low for our 450 mbit network.
If you plan to do change which will increase already high load on DHT servers you can expect much more people will turn dht server mode off.
If high CPU usage on DHT server is related to encryption then possibly some light unencrypted UDP DHT query protocol similar to DNS would help and make backend more efficient - store all routing info in RAM and dump to disk on regular interval for persistence - classic inMemory DB design.
I would be interested to hear other opinions after @guillaumemichel's study further up. I believe we can make a decision based on the data we have and either close the issue, or proceed with a change.
Given the summary from @guillaumemichel (https://github.com/ipfs/kubo/issues/9045#issuecomment-1195577412):
I think we should either not change anything or we could try starting the DHT server mode after 2-3 hours, which should give a small improvement in terms of routing table stale entries, but the individual load would increase by 3x.
By looking at numbers alone, changing the default behavior of how kubo uses go-libp2p-kad-dht does not sound like a big improvement (given the cost: 3x increased load on DHT servers)
My vote would be to keep default settings as they are: we can always revisit changing the default when public DHT characteristics change, perhaps as part of a bigger effort to add "reputation system" (see "Flags" of Tor nodes).
An open question is, should we allow Kubo users (and downstream projects) to set custom MinDHTServerUptime
(opt-in).
Some things at play:
Routing.Type
to entries in Routinbg.Routers
)
MinDHTServerUptime
is most likely blocked until that work is done, and itself may require a bigger refactor of go-libp2p-kad-dht
, as noted by @Jorropo. I don't think MinDHTServerUptime
brings much value, but would like to hear from others before closing this.
@lidel
wiring up MinDHTServerUptime is most likely blocked until that work is done, and itself may require a bigger refactor of go-libp2p-kad-dht, as https://github.com/ipfs/kubo/issues/9045#issuecomment-1158829676.
In reality this is far easier than expected. There is a private function already doing this: https://github.com/libp2p/go-libp2p-kad-dht/blob/0b7ac010657443bc0675b3bd61133fe04d61d25b/dht.go#L736
This is because of the AutoServer
logic (will wait until you are reachable to enable server).
It would be trivial to add a check before that.
Can be measured data used to answer question if greater then default 1 day DHT entry lifetime will help?
Can be measured data used to answer question if greater then default 1 day DHT entry lifetime will help?
@hsn10 what do you mean?
- https://stats.ipfs.network/nebula-22-07-01/ That helps establish how big of a problem this is.
13 % online would mean a 1/8th chance that the peer I'm asking will ever respond? That sounds like a big improvement opportunity 🤔
Turning on the DHT server mode after 48 hours would reduce the number of DHT server peers by 35x and increase the individual load by 35x, which is probably not desirable.
I agree with the first one. But on the other hand this does not account for the strain offline nodes put onto the network:
1) When nodes go offline, they take our precious records with them. So this load has to be provided by other nodes in the network.
2) The more likely it is, that nodes go online and offline inside the range of a CID the more likely it is that we haven't pushed the information to all the nodes currently online in this range.
This means only a smaller amount of nodes can actually respond with the information the querying node needs, so more hops, more connects, more latency, more load.
3) A larger DHT in general means the amount of nodes that need to be known to beeing able to provide/resolve is bigger. If they are behind unreliable Internet connections or go on/offline all the time this is an effort without any benefit. So the memory consumption is higher, more retries, redials etc.
4) Having large amount of unreliable nodes means we have to choose low reprovide intervals and higher amounts of copies pushed into the DHT.
This is actually increasingly an issue, the more data we like the index, so we would likely decrease the cost of running ipfs in the long run, if we can reduce those multipliers for the network load.
Storing a large amount of data (say half a TB) at the moment is creating quite a lot of background network activity, just to keep it "in the index".
As quoted earlier, I see an average of 200 TCP resets per second just for this task alone.
If we could lower this but instead do something more useful with our server's capacity, while eliminating the useless background load on nodes which go offline after an hour or two, I would be happy with the tradeoff.
@guillaumemichel DHT entry times out after 1 day and client has to submit it again, This is causing problems where client can't announce all its DHT keys within 24 hours.
Do we have sufficient data to make decision if we change 1 day TTL to 2 days or something else that it will improve network?
13 % online would mean a 1/8th chance that the peer I'm asking will ever respond? That sounds like a big improvement opportunity thinking
Not exactly. The plot you mentioned indicates that 13% of the distinct peerID
s observed all peers routing tables are always online. Peers that are marked as offline in this plot (37% of the peerID
s) make up only a very small fraction of the global DHT routing table. We showed in RFM19 that only 5.7% DHT Routing Tables entries are unreachable. Hence, there is ~94% chance that the peer you are asking will respond.
I agree with the first one. But on the other hand this does not account for the strain offline nodes put onto the network:
RFM17 draft report by @cortze studies among others the reachability of Provider Records over time, the number of additional hops, different values for the Provider Records replication parameter K, and republish intervals. And it shows that there is no risk of the content being unavailable because of the churn, we could even lower to replication parameter from 20 to 15, and have longer republish intervals.
As quoted earlier, I see an average of 200 TCP resets per second just for this task alone.
If you have some measurements or statistics, I would be happy to take a look at them! This would be helpful to improve the network.
@guillaumemichel DHT entry times out after 1 day and client has to submit it again, This is causing problems where client can't announce all its DHT keys within 24 hours.
The republish frequency is currently 12h, and the Provider Records stays for 24h on the DHT server node. If a client isn't able to republish its content within 24h, then it is probably not able to provide it to peers requesting it either. The challenge here is not whether the data is available but rather decreasing the network load. But I agree that the republish frequency could be adapted.
Do we have sufficient data to make decision if we change 1 day TTL to 2 days or something else that it will improve network?
Yes, we have data in RFM17, I would recommend you to have a read :)
There are a few thoughts I would like to leave in this thread, just to make sure that we are all on the same page:
As it is already explained in RFM17, the current DHT k replication value for the Provider Records (k=20) was set to countermeasure that churn rate. From the results that I got and summarized in that report, k=20 is more than sufficient to overcome that churn rate. ~70% of the peers that keep the Provider Records stay active over the +24hours since we first contact them to store the records. The performance of the current configuration (talking about the Provider Records Liveness) is good and stable, which opens several windows to reduce the network's overhead while maintaining the current performance.
Adding a delay to upgrade DHT Clients to DHT Servers is a measure to reduce the node churn in the network with a significant side effect: it can significantly increase the overhead in the network by reducing the number of DHT Servers, as @guillaumemichel explained before. It does make sense to add that policy in ipfs/kubo nodes, but it doesn't make sense to do it while maintaining the whole set of DHT configurations (e.g. K replication value). I would say that If we apply this feature to reduce the churn rate of the DHT Servers, we must reduce the K value accordingly so that the overhead doesn't get affected. Otherwise, K=20 would remain as the overkill value and it would be one of the factors that generate the overhead. TLDR: this policy reduces the number of DHT Servers, but they will be more stable. This means we can replicate less the PR for each CID, achieving the same retrievability performance.
From my point of view, the current level of overhead in the network is a problem with a higher priority than the current node churn. These are some of the suggestions that I propose in the RFM that would reduce the overhead:
Apologies for joining a bit late the discussion.
Checklist
Description
Currently, all newly started nodes do advertise as DHT Server if they can handle incoming connections.
A lot of computers are used just for a couple of hours and then restarted, put into standby or have longer outages of their internet connection.
I think go-ipfs should start a timer on startup and if there's a connection outage for more than 10 minutes, a standby event etc. the timer will be reset.
Once 48 hours are reached, the node switches to the current 'dht' setting which will decide if the node can handle incoming connections etc. and then may switch to the server profile.
This will eliminate the efforts to push data to nodes which are not capable of holding data for longer periods of time available on their network, but recover them after two days, if there's for example a connection outage.