opentensor / subtensor

Bittensor Blockchain Layer
The Unlicense
158 stars 157 forks source link

Solve the DDoS problem, once and for all #504

Open ppolewicz opened 5 months ago

ppolewicz commented 5 months ago

Is your feature request related to a problem? Please describe.

Ideally the IP of the miner would only be known by the honest validators and the IP of the validators would only be known by the honest miners: if a dishonest party knows the IP of a miner or a validator, they will be able to DDoS it.

DDoS protection alternatives are WAF, cloudflare etc, but those can get extremely expensive for some subnets (think subnet 21 filetao which handles massive amount of data). This is just not financially viable at high volumes of data.

As DDoS is rampant in many subnets today, a distributed solution concept is theorized to provide a decentralized solution for the problem while reducing the load on the chain.

Describe the solution you'd like

We need a distributed decentralized network of nodes (which cannot be DDoSed) that will securely transfer the information between miners and validators about IP and port which they'll accept connections from.

The idea is to:

  1. Remove the IP, IP version, port from the metagraph and the chain. The chain will not store any IP addresses and ports after the change.
  2. Create a new message that the client will be able to submit to the subtensor, which will include::
    • miner hotkey
    • subnet id
    • for every validator uid: ip, ip version and port, all encrypted with the public key of the validator (think hotkey pubkey part, unless you are aware of the encryption support effort, in which case think axon 256bit encryption key)
    • current block (as a nonce)
    • signature of the entire message with the private key of the miner (again, think hotkey, unless you know about encryption effort in which case think 256bit encryption key of the miner). Seed = block id?
  3. Make the subtensor read the message and make sure it's fine
    • hotkey is a miner
    • every validator uid has vpermit
    • there is no locally stored version with a block newer than current_block - rate limiting threshold (50 blocks? 100?)
    • the signature matches the hotkey
  4. Save this message in its raw form in a map[subnet_id][uid_of_miner] sorted by last update time so that it can be shared with others if they ask for it (a new subtensor would boot and ask any peer to share it after it has the current block) (remember about rate limits)
  5. Make it so that subtensors connected between each other pass those messages to their peers
    • watch out for traffic storms, we need some sort of a spanning tree there or something, but I assume it's been solved already as the block information is being transmitted between the peers in a similar manner
  6. Add the capability for client applications of validators to ask their subtensor for the map for every miner, validate that the entire payload is correct, fish out the portion for uid of our validator, decrypt it using our private key and save that in a local map so that the validator knows where to go if it wants a miner to do something for them. The validator will run this query every few minutes against their subtensor, but it will use If-modified-since header expecting to get a list of uid maps updated since the time/block indicated by the header.

Describe alternatives you've considered

These are all inferior in comparison to a design where the attacker has no IP of the target. If one of the IPs gets DDoSed, we know which validator is responsible because nobody else knew that address and in this case they wouldn't have a reason to DDoS as they can just stop listening to that miner if thats what they want to achieve.

Additional context

Smart subnet code will reduce the number of IPs used when not under attack to limit the costs and will only split off traffic to different ips when it's being attacked, but this will only take a few minutes. If this is done well, it should make DDoS an unfeasible strategy for getting an edge over another miner.

sam0x17 commented 4 months ago

I definitely agree we should move away from storing unencrypted IP addresses on-chain as this is an obvious attack surface.

I do think obfuscating validator and miner IPs is a good idea, and your proposal would be a good way of doing this. This would be a long journey though, and it's not an issue opentensor is going to have bandwidth to pick up internally for some time. Is this an issue you would be willing to work on with some guidance from the team? Is there a minimal / MVP version of this that you think would be attainable? What does the transition to this look like?

As a side note, I think it is a bit silly that we only allow IP addresses and not hostnames, as hostnames are much easier to secure, albeit with conventional means like WAF/Cloudflare/etc as you mentioned (obviously the potentially very long length of hostnames also comes into play here, but assume we could cap that at 36 characters or something). That said, it's arguably almost just as easy to stand up a floating IP in services like digital ocean and stick load balancers, throttling, etc., between the floating IP and your actual node. Very much not decentralized, but I guess my point is people would probably have an easier time securing their nodes in the short term if they could specify a hostname rather than just an IP.

But yeah, I think your proposal is reasonable as long as we can figure out how to make the transition as safe and easy as possible and minimize any risk.

ppolewicz commented 4 months ago

An mvp would be to have a centralized v1 solution (probably way faster to prototype) and, if it indeed does work as well as we wish, this PR would be a decentralized v2 followup.

While decentralization is a noble pursuit, I'm not entirely sure how much we should sacrifice right now to attain it's purest form.

If you agree with the prototype plan, then I might have an idea on how to get one that is somewhat decentralized already.

sam0x17 commented 4 months ago

I think this makes a lot of sense as a way to proceed. The main thing is we want to pull this off without making performance worse, and the centralized way will probably be much less dicey in that regard