ipfs-shipyard / nopfs

Say NO with NOpfs! NOpfs provides content-blocking-layer capabilities for IPFS (Kubo).
Apache License 2.0
21 stars 1 forks source link

ipfs blockservice is incompatible with sessions #34

Open Jorropo opened 8 months ago

Jorropo commented 8 months ago

https://github.com/ipfs-shipyard/nopfs/blob/5edc09acb1015a15597567f8db6c204325d45a33/ipfs/blockservice.go#L62-L70

See how the code does not apply any blocking on sessions. Theses are used by boxo/blockservice.NewSession when a session is used.

This also cause issues with new ContextWithSession feature because it's not possible to unwrap a nopfs blockservice and access the real one underneath (so the context value key becomes the nopfs blockservice which isn't what we want because nopfs doesn't do session redirection based on context).

We should do blocking on the blockstore.Blockstore and exchange.Exchange API arguments before they are passed to the blockservice.

hsanjuan commented 8 months ago

We should do blocking on the blockstore.Blockstore and exchange.Exchange API arguments before they are passed to the blockservice.

What if it returns a blocking Exchange and blocking Blockstore when those methods are called ?

Jorropo commented 8 months ago

This would be incompatible with this new boxo patch:

https://github.com/ipfs/boxo/commit/22fa8b16a3e530b86542286bebba77c08e17487e#diff-c3167a605aeb6a2f9430f5730ee6508f88e15022a746998ff6ec8ba013d0d6dc

The problem is that in the wrapping code I use the incoming blockservice as context key: https://github.com/ipfs/boxo/commit/22fa8b16a3e530b86542286bebba77c08e17487e#diff-c3167a605aeb6a2f9430f5730ee6508f88e15022a746998ff6ec8ba013d0d6dcR494-R497

So github.com/ipfs-shipyard/nopfs/ipfs.(*Blockservice) is used when wrapping the context. But unwrapping is implemented in github.com/ipfs/boxo/blockservice.(*blockservice) which use it's receiver as key into the context. So the two don't match and it does not work.

github.com/ipfs/boxo/blockservice.(*blockservice).GetBlock has this check:

    block, err := blockstore.Get(ctx, c)
    switch {
    case err == nil:
        return block, nil
    case ipld.IsNotFound(err):
        break
    default:
        return nil, err
    }

Which would allows us to do filtering only in the blockstore and have it also filter the exchange thx to:

    default:
        return nil, err

I'll fix that in GetBlocks and then nopfs can implement a wrapping filtering blockstore instead of blockservice while having the same effects.

Other solutions exists but I think it's the least bad one.

hsanjuan commented 8 months ago

That sounds like relying on an implementation detail that causes exchange-blocking by chance (only because blockstore is checked first?) so I'm not very sold... better to wrap both.

I think conceptually BlockService is the right layer... it feels weird that we have to work around the fact that implementation for a BlockService session bypasses the layer and uses underlying exchange/blockstore directly. Can't we wrap sessions with blocking code directly by making NewSession part of the BlockService interface, and making sure we pass the underlying blockservice to the Blockservice.NewSession so that grabSessionFromContext works?

Jorropo commented 8 months ago

That sounds like relying on an implementation detail that causes exchange-blocking by chance (only because blockstore is checked first?) so I'm not very sold... better to wrap both.

Fair.

Can't we wrap sessions with blocking code directly by making NewSession part of the BlockService interface, and making sure we pass the underlying blockservice to the Blockservice.NewSession so that grabSessionFromContext works?

That would work but that require breaking the interface and I don't know if anyone else has custom blockservice implementation.

Else we could have something like:

func WithContentFilter(b func(cid.Cid) error) Option {/* ... */}

In boxo/blockservice this is already what we are doing with verifcid.

This also means nops only needs to provide 1 small function instead of many plumbing methods.

hsanjuan commented 8 months ago

What about a new SessionBlockstore interface that has NewSession ?

Jorropo commented 8 months ago

I think #35 is cleaner. It replace ~80 slocs with:

func ToBlockServiceBlocker(blocker *nopfs.Blocker) blockservice.Blocker {
    return func(c cid.Cid) error {
        err := blocker.IsCidBlocked(c).ToError()
        if err != nil {
            logger.Warnf("blocked blocks for blockservice: (%s) %s", c, err)
        }
        return err
    }
}

While not adding as much complexity in boxo/blockservice.

What about a new SessionBlockstore interface that has NewSession ?

I don't get how this would work. Today you can't wrap Session. So NewSession is called, it checks if nopfs implements SessionBlockstore. It does. Then in nopfs.Blockservice.NewSession it call boxo/blockservice.NewSession on the underlying blockservice, at this point we have a true boxo blockservice session. But you can't wrap this session since session isn't an interface (which might be an issue in itself). So I don't see where the filtering would happen.

hsanjuan commented 8 months ago

I think #35 is cleaner.

It is, but relies on the fact that a specific blockservice implementation exists that provides content-blocking capabilities.

But you can't wrap this session since session isn't an interface (which might be an issue in itself)

Session should be an interface (or a blockservice itself) and we wrap it... similar to merkledag.NewSession which returns an ipld.NodeGetter?

Sorry for resisting here a bit, just aiming to keep the conceptual layer model based on interfaces so that nopfs stays as purely plug-in (I think is nice that things can be like this). If we have a content-blocking BlockService, we should also think of doing the same in Namesys and Resolver and removing wrapping for general consistency.

hacdias commented 8 months ago

@Jorropo how much work would it be to make session an interface?

I did look at (https://github.com/ipfs/boxo/pull/556) and it seems to do what it says it does, but I'd like us to have a consensus here on what to do next.

That would work but that require breaking the interface and I don't know if anyone else has custom blockservice implementation.

Maybe breaking the interface isn't the worst idea if the outcome is better and cleaner. We can always provide with a changelog that explains clearly what needs to be updated in someone else's blockservices. What do you think @Jorropo @hsanjuan?

Jorropo commented 8 months ago

The current situation is pretty bad because it's neither a struct pointer nor an interface, it's both.

Imo we should remove the blockservice interface and put all these kinds of ad-hoc things in boxo/blockservice because we can contain cross ad-hoc feature interactions and easily solve future ones.

I'm also fine making newsession a method of blockservice and make session an interface.

We just need to decide which one of the two way forward.

hacdias commented 8 months ago

Imo we should remove the blockservice interface and put all these kinds of ad-hoc things in boxo/blockservice because we can contain cross ad-hoc feature interactions and easily solve future ones.

I need a more concrete example to understand what you mean by that 😅

Jorropo commented 8 months ago

I'll submit both PR then we can choose, aint much work.

Jorropo commented 8 months ago

While implementing the no interface solution I found yet an other bug. The IsCached call from the gateway bypass the blockservice and thus verifcid and nopfs. It's probably fine since it doesn't send content, just a boolean state saying if we have or not the block, but I think this shows why layerying more and more layers when the scope is very wide and loosely defined leads to bug.

Jorropo commented 8 months ago

I posted both options there:

They are both significantly bigger than the original workaround I sent in https://github.com/ipfs/boxo/pull/556

hsanjuan commented 8 months ago

Thanks! please give me a couple of days to look into the proposals.

hsanjuan commented 8 months ago

In principle I'm ok with #563 in that it is aligned with how we normally do things... (granted this might not be the best way of doing things all the time, but seems straightforward at least).

Jorropo commented 8 months ago

It's how we usually do things indeed, however I think it might be too complex for it's own good. ipfs/boxo#561 is simpler when considering the total system.

The tricky thing is that blockservice.Blockservice is really a session factory when you think about it. And we don't have factories usually.

hsanjuan commented 8 months ago

What is the motivation for embedding sessions in the context? I may have missed it but I didn't see what prompted that change in the first place.

Jorropo commented 8 months ago

@hsanjuan this is a thing a couple of peoples including me had in mind. https://github.com/ipfs/boxo/issues/94 is the oldest written trace I know about.

We are quite bad at passing around sessions everywhere. For example boxo/gateway need ~15 sessions to resolve the path from dist.ipfs.tech and download 25MB kubo .zip file, even tho we should only use one. This leads to extra costs and latency because each new session start broadcasting to every peer from scratch.

This constant session reset prevent us to do other optimizations (like setting the initial provider delay on bitswap to 0s) because this generate DHT load at each reset.

The main reason we are bad at passing sessions outside of context is that people usually have a blockgetter argument in their New... function, they instantiate some children services they need (like fetcher, dagservice, ...) and store that in the struct. Then when handling requests they call the services they need.

However to do that properly you need to do initialization per request, not per application instance, which is fine, it's just way more work to update that in all the data stack of kubo and rainbow.

That also mean we can't use fx for anything between blockservice and final consumers (or we would need an fx instance per request).

Passing the session in the context is cheap, it works, and I don't think it makes the code worst (incentivize to pass everything in the context at the cost of compiletime safety) since the unwrapping only happen on existing blockservice object, so you still need to pass the blockservice which let you analyze and follow the code statically.

hsanjuan commented 8 months ago

I understand. Honestly, now that mostly everything is in boxo, it might be worth to streamline session handling one day. Passing things in context is a hack so that we don't pass them as explicit arguments after all... But as you say, it is a cheap approach and solves the issue for a low price, compared to the non-trivial re-wiring otherwise.