Open Jorropo opened 8 months ago
We should do blocking on the
blockstore.Blockstore
andexchange.Exchange
API arguments before they are passed to the blockservice.
What if it returns a blocking Exchange and blocking Blockstore when those methods are called ?
This would be incompatible with this new boxo patch:
The problem is that in the wrapping code I use the incoming blockservice as context key: https://github.com/ipfs/boxo/commit/22fa8b16a3e530b86542286bebba77c08e17487e#diff-c3167a605aeb6a2f9430f5730ee6508f88e15022a746998ff6ec8ba013d0d6dcR494-R497
So github.com/ipfs-shipyard/nopfs/ipfs.(*Blockservice)
is used when wrapping the context.
But unwrapping is implemented in github.com/ipfs/boxo/blockservice.(*blockservice)
which use it's receiver as key into the context. So the two don't match and it does not work.
github.com/ipfs/boxo/blockservice.(*blockservice).GetBlock
has this check:
block, err := blockstore.Get(ctx, c)
switch {
case err == nil:
return block, nil
case ipld.IsNotFound(err):
break
default:
return nil, err
}
Which would allows us to do filtering only in the blockstore and have it also filter the exchange thx to:
default:
return nil, err
I'll fix that in GetBlocks
and then nopfs can implement a wrapping filtering blockstore instead of blockservice while having the same effects.
Other solutions exists but I think it's the least bad one.
That sounds like relying on an implementation detail that causes exchange-blocking by chance (only because blockstore is checked first?) so I'm not very sold... better to wrap both.
I think conceptually BlockService is the right layer... it feels weird that we have to work around the fact that implementation for a BlockService session bypasses the layer and uses underlying exchange/blockstore directly. Can't we wrap sessions with blocking code directly by making NewSession
part of the BlockService interface, and making sure we pass the underlying blockservice to the Blockservice.NewSession so that grabSessionFromContext
works?
That sounds like relying on an implementation detail that causes exchange-blocking by chance (only because blockstore is checked first?) so I'm not very sold... better to wrap both.
Fair.
Can't we wrap sessions with blocking code directly by making
NewSession
part of theBlockService
interface, and making sure we pass the underlying blockservice to theBlockservice.NewSession
so thatgrabSessionFromContext
works?
That would work but that require breaking the interface and I don't know if anyone else has custom blockservice implementation.
Else we could have something like:
func WithContentFilter(b func(cid.Cid) error) Option {/* ... */}
In boxo/blockservice
this is already what we are doing with verifcid
.
This also means nops only needs to provide 1 small function instead of many plumbing methods.
What about a new SessionBlockstore
interface that has NewSession
?
I think #35 is cleaner. It replace ~80 slocs with:
func ToBlockServiceBlocker(blocker *nopfs.Blocker) blockservice.Blocker {
return func(c cid.Cid) error {
err := blocker.IsCidBlocked(c).ToError()
if err != nil {
logger.Warnf("blocked blocks for blockservice: (%s) %s", c, err)
}
return err
}
}
While not adding as much complexity in boxo/blockservice
.
What about a new
SessionBlockstore
interface that hasNewSession
?
I don't get how this would work.
Today you can't wrap Session.
So NewSession
is called, it checks if nopfs
implements SessionBlockstore
.
It does.
Then in nopfs.Blockservice.NewSession
it call boxo/blockservice.NewSession
on the underlying blockservice, at this point we have a true boxo blockservice session.
But you can't wrap this session since session isn't an interface (which might be an issue in itself). So I don't see where the filtering would happen.
I think #35 is cleaner.
It is, but relies on the fact that a specific blockservice implementation exists that provides content-blocking capabilities.
But you can't wrap this session since session isn't an interface (which might be an issue in itself)
Session should be an interface (or a blockservice itself) and we wrap it... similar to merkledag.NewSession
which returns an ipld.NodeGetter
?
Sorry for resisting here a bit, just aiming to keep the conceptual layer model based on interfaces so that nopfs stays as purely plug-in (I think is nice that things can be like this). If we have a content-blocking BlockService, we should also think of doing the same in Namesys and Resolver and removing wrapping for general consistency.
@Jorropo how much work would it be to make session an interface?
I did look at (https://github.com/ipfs/boxo/pull/556) and it seems to do what it says it does, but I'd like us to have a consensus here on what to do next.
That would work but that require breaking the interface and I don't know if anyone else has custom blockservice implementation.
Maybe breaking the interface isn't the worst idea if the outcome is better and cleaner. We can always provide with a changelog that explains clearly what needs to be updated in someone else's blockservices. What do you think @Jorropo @hsanjuan?
The current situation is pretty bad because it's neither a struct pointer nor an interface, it's both.
Imo we should remove the blockservice interface and put all these kinds of ad-hoc things in boxo/blockservice
because we can contain cross ad-hoc feature interactions and easily solve future ones.
I'm also fine making newsession a method of blockservice and make session an interface.
We just need to decide which one of the two way forward.
Imo we should remove the blockservice interface and put all these kinds of ad-hoc things in boxo/blockservice because we can contain cross ad-hoc feature interactions and easily solve future ones.
I need a more concrete example to understand what you mean by that 😅
I'll submit both PR then we can choose, aint much work.
While implementing the no interface solution I found yet an other bug.
The IsCached
call from the gateway bypass the blockservice and thus verifcid
and nopfs
. It's probably fine since it doesn't send content, just a boolean state saying if we have or not the block, but I think this shows why layerying more and more layers when the scope is very wide and loosely defined leads to bug.
I posted both options there:
They are both significantly bigger than the original workaround I sent in https://github.com/ipfs/boxo/pull/556
Thanks! please give me a couple of days to look into the proposals.
In principle I'm ok with #563 in that it is aligned with how we normally do things... (granted this might not be the best way of doing things all the time, but seems straightforward at least).
It's how we usually do things indeed, however I think it might be too complex for it's own good. ipfs/boxo#561 is simpler when considering the total system.
The tricky thing is that blockservice.Blockservice
is really a session factory when you think about it. And we don't have factories usually.
What is the motivation for embedding sessions in the context? I may have missed it but I didn't see what prompted that change in the first place.
@hsanjuan this is a thing a couple of peoples including me had in mind. https://github.com/ipfs/boxo/issues/94 is the oldest written trace I know about.
We are quite bad at passing around sessions everywhere.
For example boxo/gateway
need ~15 sessions to resolve the path from dist.ipfs.tech
and download 25MB kubo .zip file, even tho we should only use one.
This leads to extra costs and latency because each new session start broadcasting to every peer from scratch.
This constant session reset prevent us to do other optimizations (like setting the initial provider delay on bitswap to 0s) because this generate DHT load at each reset.
The main reason we are bad at passing sessions outside of context is that people usually have a blockgetter argument in their New...
function, they instantiate some children services they need (like fetcher, dagservice, ...) and store that in the struct. Then when handling requests they call the services they need.
However to do that properly you need to do initialization per request, not per application instance, which is fine, it's just way more work to update that in all the data stack of kubo and rainbow.
That also mean we can't use fx
for anything between blockservice and final consumers (or we would need an fx
instance per request).
Passing the session in the context is cheap, it works, and I don't think it makes the code worst (incentivize to pass everything in the context at the cost of compiletime safety) since the unwrapping only happen on existing blockservice object, so you still need to pass the blockservice which let you analyze and follow the code statically.
I understand. Honestly, now that mostly everything is in boxo, it might be worth to streamline session handling one day. Passing things in context is a hack so that we don't pass them as explicit arguments after all... But as you say, it is a cheap approach and solves the issue for a low price, compared to the non-trivial re-wiring otherwise.
https://github.com/ipfs-shipyard/nopfs/blob/5edc09acb1015a15597567f8db6c204325d45a33/ipfs/blockservice.go#L62-L70
See how the code does not apply any blocking on sessions. Theses are used by
boxo/blockservice.NewSession
when a session is used.This also cause issues with new
ContextWithSession
feature because it's not possible to unwrap a nopfs blockservice and access the real one underneath (so the context value key becomes the nopfs blockservice which isn't what we want because nopfs doesn't do session redirection based on context).We should do blocking on the
blockstore.Blockstore
andexchange.Exchange
API arguments before they are passed to the blockservice.