ipfs / pinning-services-api-spec

Standalone, vendor-agnostic Pinning Service API for IPFS ecosystem
https://ipfs.github.io/pinning-services-api-spec/
Creative Commons Zero v1.0 Universal
100 stars 27 forks source link

Networking difficulties while pinning data #18

Closed aschmahmann closed 4 years ago

aschmahmann commented 4 years ago

Problem:

The current API has the client inform the pinning service of the CID of the data to pin. While this may be convenient if the data is already in the network, it has downsides if the client is the only one with the data including:

  1. If the client node is unreachable (e.g. behind a symmetric NAT) and they're hoping to use a pinning service to make their content publicly accessible then they're not going to be able to get their data to the pinning service since the pinning service will not be able to reach them
  2. Even if the client is reachable the pinning service still needs to wait for a DHT provide to complete before they can start retrieving the data. This may not be a huge problem, but it is definitely annoying.
    • As a bonus problem if the client node dies in the middle of uploading a large amount of data then the pinning service will have to wait for a large number of CIDs to be provided, not just the CID of the pin object root

Recommended Solution

Take our existing HTTP API and expose it over libp2p instead of over TCP. This will ensure that we are connected to the peer that is supposed to be fetching data from us.

Comparing with Other Solutions

  1. Instead of just sending the pin object CID send the entire pin object
    • Pros:
      • Pretty easy to implement
    • Cons:
      • Does not allow for reduced bandwidth usage in the event the some part of the DAG is already stored by the pinning service
      • Does not allow for resuming cancelled uploads (very possible during the upload of large data)
  2. Have the client nodes peer with some upload nodes from the pinning service before they send the query so that they will get pinged by Bitswap and not be dependent on a DHT lookup
    • Pros:
      • Requires minimal additional code in go-ipfs (js-ipfs doesn't having peering implemented yet)
    • Cons:
      • Requires adding both an HTTP endpoint and a libp2p upload endpoint
      • libp2p upload endpoints cannot AFAIK make use of CA certificates which means needing to have a consistent set of peerIDs that are used by the upload endpoints, or relying on DNSLink which isn't signed
      • AFAIK we can't really load balance (inbound) pinning requests (aside from having multiple target nodes and just choosing one of them)
      • Some brittleness/complexity related to when connections break (as they sometimes do)
      • what happens if the peering connection is temporarily broken when the HTTP request goes out?
      • what happens if the peering connection breaks during the upload?
      • for these cases when the connection is re-established will they still be in the session, when/how will they be re-added?
  3. Use the proposed HTTP API, but do so over libp2p
    • i.e. instead of sending a standard HTTPS request to pinning.service form a libp2p connection to /pinning/service and send the HTTP requests over that connection
    • Pros:
      • We seem to have libraries for doing this already in go that are actually pretty small (https://github.com/libp2p/go-libp2p-http which relies on https://github.com/libp2p/go-libp2p-gostream)
      • Makes it simpler for us to switch to a custom libp2p protocol in the future since we can just figure out which protocols it speaks (e.g. custom, or just http)
      • Only needs an libp2p endpoint, not also a standard HTTP endpoint
      • Gives us client side auth for free, if we want to use it, since we can just check the peerID on the client side of the connection
    • Cons:
      • Adds another library dependency to the protocol (may not be available in all languages)
      • Similar britleness/complexity related to when connections break
      • A little less since it's guaranteed that the libp2p connection exists at the time the HTTP Request is issued
      • libp2p endpoint CA issues as in 2
      • No loadbalancing on Puts (as in 2) or Gets

Any of these solutions seem viable, and I'm interested if there are any other proposals out there that I've missed. However, I'm pretty sure we need to do at least one of these things or we're going to have really serious problems with users failing to upload data to pinning services.

It seems like people are not a fan of option 1, which leaves us with 2 and 3. I'm not sure if they're really that different from each other, although I'm currently leading towards option 3 as it's much less hacky and gives us some other nice benefits.

Thoughts?

lidel commented 4 years ago

While I agree "remote pinning over libp2p" is the most elegant thing, and we will most likely have something like that in the future, I don't believe "http over libp2p" is feasible for the mvp at hand:

Vanilla HTTP API is a hard requirement for the time being. Without it, we won't see community/partner adoption.

Q: Can we solve the problem with HTTP alone?

I'd like us to look into ways we can improve content routing while keeping HTTP API. I believe we implemented (1) in #14 already (entire Pin object is now sent to pinning service).

Q: would simple peerid/multiaddr hints be enough?

What if:

Sending and acting on these hints would be optional, but pinning from apps with known peerid such as go-ipfs could leverage those hints to ensure peering is in place and data transfer starts immediately.

@aschmahmann Would this be good enough for support in go-ipfs?

aschmahmann commented 4 years ago

I believe we implemented (1) in #14 already

Not quite since we don't send the entire DAG, just the top level pointers.

client sends own peerid/multiaddrs in Pin.meta[provider] so Pinning Service can try connecting to a known provider in parallel to asking DHT

doesn't really help if the client is undialable which is the case I'm concerned about

pinning service returns peerid/multiaddrs in PinStatus.meta[receiver] so client can try connecting to a designated receiver node?

This will work and is basically an easier to deal with version of option 2 👍 (e.g. the pinning services don't have to know keep their PeerIDs long term). Given that the vanilla HTTP API is mandatory for the time being this seems like our only real option and should work reasonably well.

lidel commented 4 years ago

Great. We need a mini spec for those hints. Would an array of string multiaddrs be enough?

aschmahmann commented 4 years ago

Yep, that should be fine

lidel commented 4 years ago

See also #22