ipfs / pinning-services-api-spec

Standalone, vendor-agnostic Pinning Service API for IPFS ecosystem
https://ipfs.github.io/pinning-services-api-spec/
Creative Commons Zero v1.0 Universal
100 stars 27 forks source link

Mandatory provider hints #22

Closed lidel closed 4 years ago

lidel commented 4 years ago

In https://github.com/ipfs/pinning-services-api-spec/pull/19#discussion_r453687149 @aschmahmann wrote

I'd really like to emphasize to current pinning services that this is really useful and they should implement it if it's not a huge ask. If many of them are unable to implement it in a reasonable time frame then we should be aware of that when dealing with user issues.

I agree this turns out to be a pretty important feature. If we leave it in meta it may be harder for services to implement it.

@jacobheun @aschmahmann @achingbrain

jacobheun commented 4 years ago

Should we promote this to a required (could be an empty array) top level field(s)? If so, would it be ok to rename it to providers in both places for simplicity? (Pin.providers and PinStatus.providers)

Yes to both. This would be a huge boon for nodes behind NATs, being able to proactively connect to the service would be great.

aschmahmann commented 4 years ago

Should we promote this to a required (could be an empty array) top level field(s)?

A big 👍 for PinStatus.providers since it'll be necessary for dealing with nodes behind NATs which is an unfortunate part of our lives these days.

A less certain 👍/🤷‍♀️ for Pin.providers. At some level we might as well because it could be useful and increase performance. However, it's shouldn't normally be needed to make the system work since the providers should generally be findable via the DHT.

If so, would it be ok to rename it to providers in both places for simplicity?

I don't feel strongly, so why not 😄

obo20 commented 4 years ago

Some thoughts here on scalability:

Currently we accept an array of "host_nodes" that users can provide when pinning content to us via the DHT. https://pinata.cloud/documentation#PinByHash

If we receive an array of node multiaddresses, the designated "pinning node" will attempt to connect to those nodes before asking for the item to be pinned.

We currently run multiple nodes in each region that we're pinning content in. While we could publish an entire list of all of these nodes, I don't see this as a super scalable solution long term as all users would need to connect to all of our nodes before the pinning process began when in reality only one node would be doing the network search. Scalability breaks here as both the number of potential "destination nodes" increases and as the number of using connecting to these nodes increases.

aschmahmann commented 4 years ago

We currently run multiple nodes in each region that we're pinning content in

How do you decide which region to use to download the data? If it's based off of where the user request came from then couldn't you just return a single random node in the region?

obo20 commented 4 years ago

How do you decide which region to use to download the data? If it's based off of where the user request came from then couldn't you just return a single random node in the region?

This is determined by what's called a "pinPolicy". This is an account level policy that determines where the user's data will be stored. (you can also pass in a custom pin policy on a per pin basis if you wish).

Essentially our system reads the pin policy, sees which region(s) the user wishes to pin in, finds the node with the least amount of usage in that region, and then adds a "pin job"(this tells the node to start searching for a piece of content) to that nodes task queue.

obo20 commented 4 years ago

After rereading this thread I think I may have misinterpreted things here. We can absolutely provide back the multiaddress of the node that is responsible for pinning a user's content.

I had initially read this as we had to provide a list of our nodes to the users before they started the request. If the users are receiving back a multiaddress to connect to in the response from a pin request, this is definitely doable.

lidel commented 4 years ago

@obo20 exactly! No worries, I understand your confusion (we were initially looking into "static peering agreements"). This per-pin approach scales much better. I opened #24 to finalize the spec of providers fields.

aschmahmann commented 4 years ago

@obo20 a brief note is that you may find some performance improvements by waiting a bit before having the storing node request the data via Bitswap since the DHT request will likely be unnecessary. It probably won't matter since if your storing nodes have queues the client nodes will probably have plenty of time to dial you.