nspcc-dev / neofs-node

NeoFS is a decentralized distributed object storage integrated with the Neo blockchain
https://fs.neo.org
GNU General Public License v3.0
32 stars 38 forks source link

404 errors are critical for big TTL values #2768

Open carpawell opened 6 months ago

carpawell commented 6 months ago

Node does not like it: https://github.com/nspcc-dev/neofs-sdk-go/pull/562. It starts forwarding requests and receives them back.

Expected Behavior

TTL describes the max forwarding number.

Current Behavior

TTL describes how many times you may try to search for objects. If there is no such object at all, there are 8 request forwardings even if you are the initiator of the request forwarding, you receive it back cause you are the container's part and the forwarder wants you to try.

Possible Solution

Not sure. Mb turn forwarding off in the object services? Spawn only requests with TTL=2 manually? Track forwardings chain and do not continue if you notice a cycle?

Steps to Reproduce (for bugs)

Update to the provided SDK version and try to get non-existing object. Or delete it. See logs that are bigger than you are expecting.

Context

Regression

https://github.com/nspcc-dev/neofs-sdk-go/pull/562

roman-khimov commented 6 months ago

It's easy to revert, although this just means that our TTLs don't work the way they were intended to.

roman-khimov commented 6 months ago

Track forwardings chain and do not continue if you notice a cycle?

This defeats the purpose, although with additional signatures this can in fact substitute TTL.

carpawell commented 6 months ago

That is S0 to me. The commit is already merged and updating sometimes does not allow deleting big objects with timeout error. Requests spam does not allow proper work even after a single multiplied request on my laptop. That is also a discussion to me cause I am not sure how that should be solved.

roman-khimov commented 6 months ago

Let's do https://github.com/nspcc-dev/neofs-sdk-go/pull/567 and then think of associated problems, they can't be fixed quickly.

carpawell commented 6 months ago

OK, can be done this way. Not S0 then (but still a discussion?). A few links to the issue's solver: TTL is decoded here and then sent to the next node here (then repeat this one more time on the other node too). A lot of timeouts can be faced when running int tests, and a lot of debug logs about the same object being searched on every container node.