Closed lidel closed 4 years ago
Should we promote this to a required (could be an empty array) top level field(s)? If so, would it be ok to rename it to providers in both places for simplicity? (Pin.providers and PinStatus.providers)
Yes to both. This would be a huge boon for nodes behind NATs, being able to proactively connect to the service would be great.
Should we promote this to a required (could be an empty array) top level field(s)?
A big 👍 for PinStatus.providers
since it'll be necessary for dealing with nodes behind NATs which is an unfortunate part of our lives these days.
A less certain 👍/🤷♀️ for Pin.providers
. At some level we might as well because it could be useful and increase performance. However, it's shouldn't normally be needed to make the system work since the providers should generally be findable via the DHT.
If so, would it be ok to rename it to providers in both places for simplicity?
I don't feel strongly, so why not 😄
Some thoughts here on scalability:
Currently we accept an array of "host_nodes" that users can provide when pinning content to us via the DHT. https://pinata.cloud/documentation#PinByHash
If we receive an array of node multiaddresses, the designated "pinning node" will attempt to connect to those nodes before asking for the item to be pinned.
We currently run multiple nodes in each region that we're pinning content in. While we could publish an entire list of all of these nodes, I don't see this as a super scalable solution long term as all users would need to connect to all of our nodes before the pinning process began when in reality only one node would be doing the network search. Scalability breaks here as both the number of potential "destination nodes" increases and as the number of using connecting to these nodes increases.
We currently run multiple nodes in each region that we're pinning content in
How do you decide which region to use to download the data? If it's based off of where the user request came from then couldn't you just return a single random node in the region?
How do you decide which region to use to download the data? If it's based off of where the user request came from then couldn't you just return a single random node in the region?
This is determined by what's called a "pinPolicy". This is an account level policy that determines where the user's data will be stored. (you can also pass in a custom pin policy on a per pin basis if you wish).
Essentially our system reads the pin policy, sees which region(s) the user wishes to pin in, finds the node with the least amount of usage in that region, and then adds a "pin job"(this tells the node to start searching for a piece of content) to that nodes task queue.
After rereading this thread I think I may have misinterpreted things here. We can absolutely provide back the multiaddress of the node that is responsible for pinning a user's content.
I had initially read this as we had to provide a list of our nodes to the users before they started the request. If the users are receiving back a multiaddress to connect to in the response from a pin request, this is definitely doable.
@obo20 exactly! No worries, I understand your confusion (we were initially looking into "static peering agreements"). This per-pin approach scales much better. I opened #24 to finalize the spec of providers
fields.
@obo20 a brief note is that you may find some performance improvements by waiting a bit before having the storing node request the data via Bitswap since the DHT request will likely be unnecessary. It probably won't matter since if your storing nodes have queues the client nodes will probably have plenty of time to dial you.
PinStatus.meta[receivers] = ['multiaddr1','multiaddr2']
list of peers to connect to to speed up transfer of pinned dataPin.meta[providers] = ['multiaddr1','multiaddr2']
list of peers that are known to have pinned data (aka "original seeds")In https://github.com/ipfs/pinning-services-api-spec/pull/19#discussion_r453687149 @aschmahmann wrote
I agree this turns out to be a pretty important feature. If we leave it in
meta
it may be harder for services to implement it.@jacobheun @aschmahmann @achingbrain
providers
in both places for simplicity? (Pin.providers
andPinStatus.providers
)