description:
We have graduated from the "school of hard knocks" from the last 6 months. Production operators avoid known failure modes and are more self-service in identifying issues that are affecting their users and potentially the whole users. This includes keeping resilient routing tables void of non-responsive nodes, guardrails to help providers from unknowing fall behind in providing, and improved bitswap with backpressure, timeouts, and metrics.
Notes:
This is a Starmap "child" issue.
These are all related to operational pain from the last 6 months. They can be delivered on independently.
eta: 2023-06-30
description: We have graduated from the "school of hard knocks" from the last 6 months. Production operators avoid known failure modes and are more self-service in identifying issues that are affecting their users and potentially the whole users. This includes keeping resilient routing tables void of non-responsive nodes, guardrails to help providers from unknowing fall behind in providing, and improved bitswap with backpressure, timeouts, and metrics.
Notes: This is a Starmap "child" issue.
These are all related to operational pain from the last 6 months. They can be delivered on independently.
Resilient routing tables: Related event: https://github.com/protocol/ipfs-vulnerabilities/issues/25 https://github.com/libp2p/go-libp2p-kad-dht/issues/811
Providing guardrails: https://github.com/ipfs/kubo/issues/9703 https://github.com/ipfs/kubo/issues/9702 https://github.com/ipfs/kubo/issues/9704
Improved bitswap: TODO: create a better issue or repurpose https://github.com/ipfs/go-bitswap/issues/560