Open rotemdan opened 6 years ago
So I don't think there is an easy way to tell if a block is pinned without loading the whole pinned tree to the memory (which seems to be what GC currently does).
For a simpler solution - we could ad an option which makes the gateway not look for blocks in the network and instead only use what already is in the blockstore (so only pinned data and cached content (which can be removed with ipfs repo gc
)), essentially making the gateway offline. Would that work for you?
For the most part, given that the kind of node I'm describing would be dedicated to only storing content, and not retrieving it from the larger network (possibly aside from replicating other cluster nodes, which I'll describe next), almost all local content would be pinned anyway, so I guess simply checking for local availability, regardless of pin status, could work (if the performance would be significantly better I would probably choose this less restrictive option anyway I guess).
For a cluster, I guess it would mean that the gateway would be effectively "local" in relation to the cluster. I'm not very familiar with IPFS internals but I could imagine that the DHT would be queried in such a way to constrain the results to "whitelist" only sources originating from within the cluster. In any case, in the vast majority of requests, the hashes would be resolved very quickly, since the nodes would have very good network connectivity to each other, even if geographically disparate. In the rare cases when clients try to "abuse" the gateway by using it as a proxy to the larger IPFS network, the request would simply stall and timeout (I'm not sure if there's a DHT timeout setting for this type of query but it could possibly be set reasonably low to mitigate this scenario).
There are some interesting prospects to having something like this. It seems like a relatively simple/cheap way to run a highly-available CDN, where, since each node also acts as gateway, popular content is automatically fetched and cached by other cluster nodes (in addition to client nodes from outside the cluster) (of course all this would only be truly feasible given that the datastore is scalable and performant enough, and the software stable/mature enough etc.)
For the cluster case - this would probably have to be implemented at bitswap level, where we'd filter from which peers we want to fetch content. We could do that at lower level, but:
Thanks for looking at this. I've found an alternative approach to filter URLs using cryptography instead (for the cluster case mainly, since the local-only case is trivial to implement efficiently), though it requires additional intermediary filtering server (unless IPFS would support it as a part of the CID/URI spec) and has various other limitations:
Instead of links, being, say:
https://my-cdn.com/ipfs/<IPFS-CID>
Have them as
https://my-cdn.com/signed-ipfs/<IPFS-CID>-<HMAC(KEY, IPFS-CID)>
So every request would be required to include a signature that would be verified by an intermediary HTTP server (possibly running on each node), or, as I mentioned, the gateway itself.
Limitations:
<IPFS-CID>-<EXP-TIME>-<HMAC(KEY, IPFS-CID, EXP-TIME)>
, which in that case it needs to be periodically renewed - this might not be feasible in some cases.A restricted gateway would be a quite useful feature for mirroring large datasets receiving frequent updates as well.
Is there any endpoint in API which returns information about whether an object path is pinned (e.g. /api/v0/pin/get
)? If so, a reverse proxy to the gateway could be configured to filter requests to accept only if the requested object is pinned. It would be a lot better if this is implemented in the IPFS itself, but this might be a simple workaround until then.
The next release (which I need to get out the door ASAP...) will have a Gateway.NoFetch
option. However, that may not be sufficient for the cluster use-case.
This is a very good feature to have and I'm glad it's being released soon.
There's going to be a lot of use cases where people want to have an HTTP convenience gateway for their own pinned sites/content, but are unwilling to allow all content from everyone to be served from their HTTP servers as a side consequence.
Note that few read-only /api
endpoints aren't yet covered by this option - see https://github.com/ipfs/go-ipfs/pull/5649#issuecomment-451337849 for the list
@magik6k speaking of which, can you file an issue for that so we don't forget?
Gateway.NoFetch
has already been a great move for gateway!
For cluster use case, how about we could extend Gateway.NoFetch
to something like Gateway.FetchOnlyFromSpecificPeers
? My current practice would be use some load balancer in front of cluster nodes' gateway.
@ywk248248 this can be done by setting Routing:none
, removing the bootstrap peers and peering with your specific peers.
I'm investigating the suitability of IPFS as a server side file storage and distribution medium.
I can manage, upload and pin files through the HTTP REST API (:5001). I would like to have the stored files available both through the IPFS network and through HTTP. The gateway seems like an easy, simple solution to provide the files directly through HTTP with good latency to the user (and would possibly be reverse proxied through NGINX and/or a third-party CDN as well).
The only issue is, I couldn't find a way to limit it to only provide locally pinned content. Making a custom intermediate server to filter out requests seems unnecessary and would require maintaining a duplicate (and probably inefficient) index into the IPFS datastore. Since it might serve millions of files, maintaining a gigantic
ipns
-based ipld document to index the files also seems wasteful and inefficient (and a possible privacy issue if directory content is exposed).I'm not interested (at this time) in creating a private network, using
--offline
or using custom bootstrap nodes, since I want to data to be available through the public IPFS network as well.I believe this "dual-stack" approach might be reasonably classified as "plausible" (given that IPFS matures to the point it provides sufficient value to be used in mainstream projects), so I decided to publish it here as a feature request (in case it is not already available! in that case I'd be really happy to know how to achieve this!)
(Edit: as a natural extension, it would probably also be useful to have an option to only serve content pinned by a cluster of servers -- thus any node in the cluster could act as a restricted IPFS gateway - that only serves content hosted within the cluster itself)