Chaincode shim `GetState` routine should return an indication of whether the key was found in the database

bcbrock commented 8 years ago

If a chaincode calls stub.GetState with a key that does not exist, no error or presence indication is returned. This can lead to crashes if the chaincode was expecting the key to be present and non-empty. Since the empty byte array is valid data this can not be used as a check for the presence of of key. It seems that the "Go way" to handle this would be to include a 3rd, 'ok' return value separate from the error return that would be set true if and only if the key were actually found in the database.

ibmmark commented 8 years ago

@srderson Can you add help wanted to this?

zmanian commented 8 years ago

I've encountered this as a challenge in writing chaincode.

For most applications protocols, I've been experimenting treating an array of len(0) as the equivalent to "key not found" has been worked fine.

The API is ambiguous on first encounter and I believe it will lead to errors.

I like the @bcbrock suggestion the third return value.

bcbrock commented 8 years ago

@srderson - I'd like to volunteer to address this issue. I need to learn how the state database works for my research.

mastersingh24 commented 8 years ago

@bcbrock - are you on this one?

bcbrock commented 8 years ago

Yes

murali-katipalli-dtcc commented 8 years ago

I can help. Can you assign the issue to me

bcbrock commented 8 years ago

I can't assign it, but I understand you are going to do it.

murali-katipalli-dtcc commented 8 years ago

Would returning the third value be the appropriate way? If we do return the third value, wouldn't it break existing GetState API. And returning multiple error return types (error and ok) doesnt seem to be the 'Go' way of doing it. Instead can we do the following - When key is not found, shim returns as an Error. Client can check on that error and take appropriate action. This way we can maintain our existing API and provide the requisite functionality. If we choose this way, should we define GetStateError type? or should we keep it simple where it just returns the generic Error with appropriate message.

Pl. advice and I can implement it accordingly

Murali (DTCC)

bcbrock commented 8 years ago

@murali-katipalli-dtcc I'm personally not worried about changing the API. This is a very new project that has not published a single release. The compiler will flag the error, people will update their chaincodes and move on. In fact the 3-return-value way will ensure that people really understand and code for the new semantics because the compiler won't allow them to continue until they have acknowledged the change in their code.

I'm not a Go programmer, so I can't say too much about the "Go way". However I think there is a good argument that not finding a datum is not an error. It is a normal course of events. Some applications may even be happy to accept the empty array as the default value and work properly. With the error-type suggestion these programs would have to check to make sure that the Error was not the non-initialized error that they didn't care about.

So I still say add another return value - and document and guarantee what value is returned if the key is not found.

muralisrini commented 8 years ago

Regardless of approach, fix depends upon ledger.GetState on the peer side letting us know if key exists or not. This may need to percolate one level down the stack.

ledger.GetState returns ([]byte, error). Would "nil" bytes imply key-not-found ? That may not be true if ledger.SetState("key", nil) is allowed (ie, nil is a valid valid value).

@manish-sethi, @srderson comment please ?

manish-sethi commented 8 years ago

@muralisrini - If we write 'nil' (or a zero-length byte array) to rocksdb, it's hard to differentiate during read time (unless, we do some special replacement for nil values - which I guess we should refrain as of now).

I think that it would be better not to allow 'nil' and a zero-length byte array.

During write, we can return error if we receive either of these values.
During read, a 'nil' would implicitly mean 'key not found' ('ok' kind of third argument can also be implemented but that only make sense if a 'nil' is allowed, as you said)

muralisrini commented 8 years ago

@manish-sethi I'm ok with that. Good to do an explicit check and prevent illegal PutState if the sematics of GetState are dependent on it. Can you open an issue for that please ?

bcbrock commented 8 years ago

Hang on. There should be no restriction on what can be written to the data base. If nil and/or the empty array are legal values that can be written to the database, then they must be allowed to be written to the database and they must be handled correctly.

srderson commented 8 years ago

I do think a 3 value return (value, ok, err) would be much better, there's just the pain of breaking all existing chancodes. Should we consider making a state stuct and have Get Put, etc. functions on that struct? It may make the godoc more readable and we could deprecate the old functions.

In terms of whether we allow nil keys and values, I could go either way. I know Java allows this. Not sure about Go. I don't see why we have to conform to the rules of RocksDB. In the future, we may use a different DB.

bcbrock commented 8 years ago

I would agree that it probably never makes sense to put nil (or any other pointer, as-is) into a database. But the empty array or empty string are legitimately useful values I think.

Where would this new state object come from? Could there be more then one of them? I'm not disagreeing, just asking.

If we wanted to add more state APIs, either as functions or methods, other programing languages and systems include "Get" APIs that allow the call to include an explicit default value that is returned in the event the data is not found, and APIs that allow a simple existence query on one or more keys.

manish-sethi commented 8 years ago

I did some investigation on this. Like rocksdb, protobuf also does not distinguishes between a nil and an empty bytes. It behaves even strangely. E.g., (a) A serialization-deserialization cycle using proto.Buffer methods results a 'nil' into a []byte{} (b) If you use proto.Marshal method for marshalling a struct this converts an empty array field (in the struct ) to a nil value. Still would have to see in our code path where all this would impact us.

But, in a summary differentiating between a nil and an empty array would be cumbersome. At best, we can try to support either of the following (c) Treat both an empty array and a nil interchangeably (d) Allow only an empty array and throw exception on an attempt to store a nil.

Further, for user on the shim side we have two option

Return a third argument from the ledger in read calls (OK, boolean) which will change the APIs
Provide another API for checking whether a given key exists

I can make changes in the ledger and state code for these because, they are spread a many places as I found during my initial investigation. Please let me know if this sounds reasonable?

bcbrock commented 8 years ago

When you say "throw an exception" in d) I assume you also mean that the chaincode would get an error return if it tried to store nil? If so that sounds reasonable to me, it doesn't make sense to store pointers.

I still believe that the most urgent requirement is for an API with a 3rd argument, whether direct or via a new object as Sheehan suggested. This is more efficient than a "key exists" call (a network round trip) followed by a "get" (another round trip).

New "key exists" or "get with default" APIs might be features in addition to a 3-return get; The 3-return get is sufficient to cover all possible use cases.

manish-sethi commented 8 years ago

On a side note - the only thing that sounds strange with returning 'ok' along with the value is that, it typically makes sense only if we allow storing 'nil' explicitly. In that case a user wants to differentiate between a non-existing key and a actual key with 'nil' value. However, on the other hand we plan to either not support nil or would say that we would be treating nil as an empty bytes.

That means ideally, if user gets a 'nil' during read, it's an indication of non-existing key. If he gets an empty array, the key exists.

At the level of ledger APIs this would work well. So, mostly, in our case 'OK' would be covering the issues if we encounter in protobuf layer when transferring a 'nil' to the shim side and if that happens to get converted in an empty array. (I am not sure whether we would hit into this issue based on how we are using it between handler and shim.)

murali-katipalli-dtcc commented 8 years ago

If I understand it correctly, protobuf is our lowest common denominator and the shim side, whether it's Java or Go should accommodate for it's short-coming (i.e. returning non-deterministic nil/empty array).

-On the ledger API side, I agree that it should return a third parameter, OK.

On the protobuf side, because of it's strange behavior (as illustrated by Manish), we should have a Get State struct like PutStateInfo, which includes the OK value (i.e. determining if a key is present or not)
On the shim side, Java or Go would look at the protobuf's GetStateInfo and provide appropriate response to it's clients : Go would return OK as an extra return parameter, where as Java may return the struct itself

Is the above correct - does that summarize it? Would that be agreeable to everyone ?

manish-sethi commented 8 years ago

I have submitted a PR https://github.com/hyperledger/fabric/pull/1471. In this PR, I have added the support for an empty array at the ledger side. Now, ledger behavior is a follows.

An attempt to write a 'nil' key/value results into an error. However, an empty byte array can be written.
GetState api returns a 'nil' if key not found, otherwise the value of the key (including an empty array) A nil check is sufficient to test whether a particular key was found or not. see method 'TestLedgerEmptyArrayValue' in ledger_test.go file.

@murali-katipalli-dtcc if you are interested in making changes for handler/shim transfer, please go ahead.

About adding a third variable at the ledger APIs level, personally I am not in the favour of this, because we are not supporting 'nil' values and a 'nil' check is sufficient for the caller to test whether key exists or not.

However, that should not stop from fixing the protobuf issue between handler and the shim. When transferring the data, we can set a flag on the handler side whether data exist or not and at the shim side appropriate action can be taken with out exposing a changed API with third value to the shim caller. Flag setting/checking would only be required if the value is an empty array/nil on the handler side; but it can also be set always. In the case of GetMultiKeys, multiple flags would need to be set.

Does that sound fine?

manish-sethi commented 8 years ago

Just to make it explicit, the flag would be required both way in the communication (setting the key and when getting the key).

muralisrini commented 8 years ago

@manish-sethi About adding a third variable at the ledger APIs level, personally I am not in the favour of this, because we are not supporting 'nil' values and a 'nil' check is sufficient for the caller to test whether key exists or not.

Agreed as this is an internal implementation detail.

About protobuf ... are you saying a "nil" ChaincodeMessage.Payload will unmarshall into a 0 byte array in the shim ? We should try to avoid putting (any more) implementation specific fields in ChaincodeMessage if we can.

manish-sethi commented 8 years ago

No @muralisrini, protobuff does it other way around. It converts a field value from empty array into nil on the destination side. If you want to avoid any implementation specific field, a hack could be to pad the value with a zero byte and unpad at the receiving end.

manish-sethi commented 8 years ago

Sorry I think that my previous message was not clear. What I meant to say was that on the sending side, if the value is 'nil' it can be left as is and it would appear on the receiving side as a 'nil'. Otherwise, a byte (say, 0x0) can be padded and unpadded so that an empty byte array appears as an empty byte array on the receiving end.

muralisrini commented 8 years ago

Ok. Basically guarantee non-nil payload would always have at least 1 byte and strip off the last 0 after the unmarshal. It is simple and avoids additional fields to ChaincodeMessage.

Comment @murali-katipalli-dtcc ?

murali-katipalli-dtcc commented 8 years ago

Agreed on the ledger API. I can take care of it on the shim side On the protobuf side - we have two options. Either artifically pad/unpad or have a specific field which determines if key exists or not?

Which route do we want to go? Pl. suggest and I can implement accordingly.

To clarify, if do a specific field for protobuf side, then we need a struct similar to PutStateInfo or have one struct StateInfo w/ a field identifying for key exists or not.

muralisrini commented 8 years ago

@murali-katipalli-dtcc : I'd go with the pad/unpad. As you point out it is simpler (esp. if wrapped up nicely in methods on both sides).

More importantly, its an implementation detail (thanks to protobuf converting 0 bytes to nil) that we don't to expose protos.ChaincodeMessage.

bcbrock commented 8 years ago

Before going with a padding approach, it is important to understand exactly what is the overhead in doing this. Both in terms of run time and garbage generated. What is the overhead?

manish-sethi commented 8 years ago

I think folks have a different view on this. @binhn and @mastersingh24 are not convinced for adding support for either nil/empty array. If we think that it is not important then I think, I can rollback the PR (it's not merged yet). Instead, we can simply add a check for empty array values also (as for nil) during write time. On the Shim side, it should work as it is. Does that sound reasonable?

muralisrini commented 8 years ago

@manish-sethi @binhn @mastersingh24 I'm on the fence about allowing PutState(key, []byte{}). Could go either way.... but I do like not having to work around []byte to nil conversions.

@bcbrock on the padding approach (moot if we don't allow []byte{}) :

it would be a go "append" on oneside and a truncate on the other. I'd think negligible overhead for most cases. The pluses are simplicity and avoidance of bleeding implementation fields into protocol structures (proto.ChaincodeMessage). Are there reasonable alternatives I'm missing ?

bcbrock commented 8 years ago

I am surprised by the suggestion that a key/value store with string values would not be able to store the empty string. The empty string is the identity element for string operations; This would be like suggesting that an integer value could not be stored as 0 or 1. The ability to store and retrieve the empty string is a requirement, not an option.

Alternatives to padding might include adding a "length", "missing" or "empty" field at the appropriate places in the protocol. To me this should be weighed against the overhead of copying large data structures to pad/unpad them if the underlying mechanisms would require this.

bcbrock commented 8 years ago

@manish-sethi When I wrote above:

I would agree that it probably never makes sense to put nil (or any other pointer, as-is) into a database.

I was mistaken. After more study of what nil means in Go, I now think there is no reason to disallow nil as an argument to PutState from Go. From Go's perspective, nil is typed. In the below

a := []Byte{}
b := []Byte

a is a 0-length byte array and b is nil. They are different things. However they both behave as 0-length byte arrays, and you can legally pass nil to a function accepting []byte where it will behave as a 0-length byte array. nil seems to be a (dubious?) optimization introduced into the Go language to avoid allocating memory for certain kinds of objects until their sizes are known. So now I think that nil should behave as a 0-length byte-array when passed to PutState.

However, for GetState I think it is acceptable (at least for Go) to use nil to mean "the key did not exist", which would obviate the need for a 3rd return value. There is no reason that I can see for GetState to differentiate between an uninitialized array and a 0-length array. When called from the database, the length of the array is known, and it is 0. (Note, Redis looks like it does something similar to this.)

muralisrini commented 8 years ago

Let me make sure to understand...

a := []byte{} //a will not be nil 
var b []byte //b will be nil

With the above you suggest

PutState(key, a) and PutState(key, b) are acceptable and will create a (key, value) in the DB
GetState(key) == nil will be treated as "key-not-exist" (we can't differentiate between nil and empty bytes)

If the above is correct interpretation, what we are saying is though key is there in the db, we will deny that with GetState... which is clearly not correct.

manish-sethi commented 8 years ago

@bcbrock Yes, you are right. A 'nil' byte array has overloaded semantics in go.

However, it may be confusing for the user to get back an empty array in both the cases (when the user stores 'nil' or an empty array). It took me a while to understand this behavior in RockDB. Further, I am not sure how this nil is interpreted by the user writing chain code in languages other than go.

I am personally in favour of just allowing empty array and not allowing 'nil'. Which may be less confusing from usage point of view.

muralisrini commented 8 years ago

+1 @manish-sethi .

Not allowing nil-value on PutState simplifies matters and I don't think would be disagreeable to chaincode developers.

PutState to allow empty byte array and not allow nil value
pad empty byte array with 1 byte on the peer side and truncate it on the shim side

Experiments with a large 20Mb byte array and 1000 iterations shows no difference due to padding when Marshalling/Unmarshalling. Even given standalone test program is not apples/apples comparison, the experiment suggests on an average this would fall through the cracks of rest of the computation in the peer. It avoids bleeding implementation details such as a length parameter in ChaincodeMessage.

With the above, there is no change to the Chaincode API. ie, "nil" from GetState will equate to key-not-found.

bcbrock commented 8 years ago

@muralisrini Thank you for doing that experiment!

++ @manish-sethi Allowing PutState(key, nil) is probably my most weakly held opinion of the things we've been talking about here, so I am OK with what you all are suggesting. However, let me make one more attempt at arguing for it and then I promise I'll be quiet.

Given

a := []Byte{}
b := []Byte

if we allow PutState(key, a) but disallow PutState(key, b), we will have created the only high-level Go language API that makes this distinction. Other than a few special idempotent operations

x = b         // x == nil after this
z = append(b) // z == nil after this

a nil of type []byte is uniformly interpreted as []byte{} when used. Therefore I see no problem with allowing PutState(key, a) and PutState(key, b), and in both cases having GetState(key) return []byte{}.

manish-sethi commented 8 years ago

@bcbrock As I said before, it's difficult to argue either way. For instance, if we follow what you mentioned above, during write we would say that "nil sameAs []byte{}" and during read, we would say "nil notSameAs []byte{}" - (because, nil means key not found for us). Which may be more confusing... That's the only reason I an not very sure on nil support.

Otherwise, having support for empty array in place, supporting 'nil' is straight. Just change the statement that throws error on receiving a 'nil' into a statement that changes the 'nil' into an empty-array to have same downstream code.

BTW, this behavior is not a []byte specific in go, I think that this is the property of 'slice' notion in go.

muralisrini commented 8 years ago

@bcbrock : are you ok with implementing

PutState to allow empty byte array and not allow nil value
pad empty byte array with 1 byte on the peer side and truncate it on the shim side

@manish-sethi there is nothing more to be done from ledger side I think ?

@murali-katipalli-dtcc : what do you think about this approach ?

bcbrock commented 8 years ago

@muralisrini Yes

manish-sethi commented 8 years ago

@muralisrini Yes, the PR with ledger changes is merged. For details on behavior of ledger, see the tests TestLedgerEmptyArrayValue and TestLedgerInvalidInput in ledger_test.go

murali-katipalli-dtcc commented 8 years ago

@muralisrini Yes, agree @manish-sethi Have you implemented writing the empty 1 byte array from the peer side?

So from my side or the shim side, I have to interpret the byte array accordingly - empty 1 byte means, key has empty array; nil means key doesn't exist. If key doesn't exist, throw appropriate error message

If that sounds correct, I will go ahead with implementation

manish-sethi commented 8 years ago

@murali-katipalli-dtcc I have not changed anything in the chaincode code. I have mainly enabled the support in the ledger.

murali-katipalli-dtcc commented 8 years ago

Got the latest today and found that this issue is fixed. The latest throws - chaincode: Key not found in state. Pl. confirm and close the issue

murali-katipalli-dtcc commented 8 years ago

Pl. ignore my above comment.Talking to @muralisrini realized that - it was my local code which had the fix. Forgot to re-compile on the shim side. Sorry ;)

I will be going ahead w/ the fix

bcbrock commented 8 years ago

Guys, what is the status of this issue? Is the change merged or going to be merged?

murali-katipalli-dtcc commented 8 years ago

Made the code changes, have to do golint/vet/format/import and request a pull. Will do it today, if not latest by this week.

murali-katipalli-dtcc commented 8 years ago

PR #2072

abhinavsakshay commented 7 years ago

what if we check the length of the byte array return by getstate if its zero or not?

hyperledger-archives / fabric

Chaincode shim `GetState` routine should return an indication of whether the key was found in the database #936