nspcc-dev / neofs-node

NeoFS is a decentralized distributed object storage integrated with the Neo blockchain
https://fs.neo.org
GNU General Public License v3.0
31 stars 38 forks source link

`ObjectService.Search` server response is always `0 OK` even on partial success #2721

Open cthulhu-rider opened 7 months ago

cthulhu-rider commented 7 months ago

being executed over multi-node container, object SEARCH op can have partial success by design. Current server behavior could be buggy

Steps to reproduce

having NeoFS with at least 2 online storage nodes (N1 and N2), i've done following steps:

Expected Behavior

$ neofs-cli -c neofs_cli.yaml object search --cid C --timeout 1m
Enter password > 
Found 2 objects.
O1
O2
$ echo $?
0

# node stops

$ neofs-cli -c neofs_cli.yaml object search --cid C --timeout 1m
Enter password > 
Found 1 objects.
O1
INCOMPLETE: number of unavailable container nodes = 1
$ echo $?
ERR_CODE

Current Behavior

$ neofs-cli -c neofs_cli.yaml object search --cid C --timeout 1m
Enter password > 
Found 2 objects.
O1
O2
$ echo $?
0

# node stops

$ neofs-cli -c neofs_cli.yaml object search --cid C --timeout 1m
Enter password > 
Found 1 objects.
O1
$ echo $?
0

Possible Solution

respond with specific status non-zero code and number of unavailable container nodes. Respond with 0 OK only when all container nodes responded

Context

i dive into SEARCH server within #2692, but dont think this really matters

Regression

No

roman-khimov commented 7 months ago

We can say that the current SEARCH is best-effort. Even if you get this additional data, there is not a lot you can do. Especially if you're to consider the case of N2 leaving the network map after some period of time.

cthulhu-rider commented 7 months ago

Even if you get this additional data, there is not a lot you can do

Best-effort - yes it is, but client needs to understand how it ended. Being a client, i wanna distinguish complete results from possibly incomplete by any reason. That's what i expect from the responsible system. What to do with this - personal matter of each client

if u mean dont respond how much nodes are unavailable - yeah it's more like sugar for now. Being in a free text message, no application will desire to process its content. At the same time, it can be very useful in test/debug setups. Depends

N2 leaving the network map after some period of time

this is another plane of the system. As the network map changes, the execution of the storage policy changes. This topic discusses the state transfer at the op exec time

carpawell commented 7 months ago

i wanna distinguish complete results from possibly incomplete by any reason

As a node, how do you distinguish disk/shard failure? Temporary unstable state caused by the object migrations after another node goes off/online? To me, any GET/PUT operation should fixed then too (and I do not agree with it).

cthulhu-rider commented 7 months ago

any GET/PUT operation should fixed then too

@carpawell pls clarify what fix r u talking about? GET/PUT cannot fin with partial success

lets focus on proposed expected behavior of SEARCH first, then we'll develop ideas further. Search is currently the only best-effort op in the protocol

carpawell commented 7 months ago

@carpawell pls clarify what fix r u talking about? GET/PUT cannot fin with partial success

If you receive ObjectNotFound from GET request, do you think that an object is completely missing? Or do you expect that some nodes may be down and an object may be available later? No info about the provided case in the current API. PUT's IncompleteObjectPut is also undetailed, REP 0 is also an incomplete object put.

Search is currently the only best-effort op in the protocol

Basically, I mean that our protocol is best-effort widewise, I cannot agree with you.

cthulhu-rider commented 7 months ago

@carpawell ur talking about clatification of failure responses while this issue is about partially succeeded ones. Neither PUT nor GET can end with partial success now. Different topics to me

but since you asked

If you receive ObjectNotFound from GET request, do you think that an object is completely missing? Or do you expect that some nodes may be down and an object may be available later?

PUT and GET responsiveness could also be improved

Basically, I mean that our protocol is best-effort widewise

i wouldn't generalize like that. At the moment, only SEARCH is such

carpawell commented 7 months ago

while this issue is about partially succeeded ones

GET and PUT can also be partially successful (every node is ok about ACL rules, all they tried to fetch an object but 404, does it mean the object does not exist? can you ensure this?)

At the moment, only SEARCH is such

So that is the point where we disagree. Why do you think it is the only one? SEARCH is an extremely generalized GET to me. If a node is up it answers you, if a node is down, it wont. That is why we have REPs and that is why we are decentralized. User can minimize data loss with replicas. How status can help you?

cthulhu-rider commented 7 months ago

GET and PUT can also be partially successful

gotcha, i should've been clarify that under "partial success" i mean details of the scenario described in this issue. Lets stick to "best-effort" term. I agree that PUT and GET can reach partial success (2/3 replicas saved or header+50% payload got), but they never finish with successful status like described SEARCH does. So, at the end of the day, only SEARCH is a best-effort op to me

How status can help you?

status is a key point of this issue. I outlined this in expected and current behavior sections. PUT and GET dont (and never will) behave like this. So we need to resolve SEARCH 1st, then we can think about other ops responsiveness

carpawell commented 7 months ago

status is a key point of this issue. I outlined this in expected and current behavior sections.

I understood.

So we need to resolve SEARCH 1st

Need for what?

cthulhu-rider commented 7 months ago

Need for what?

for this issue to be resolved

carpawell commented 7 months ago

for this issue to be resolved

And that is what i suggest to discuss. Is it an issue? And how a user can use the suggested new info?