Chaincode crashes when "query" is received just after starting up

VRamakrishna commented 8 years ago

There may be a bug in the chaincode FSM, which strikes at random and is not reliably reproducible. It occurs when one or more "query" requests are sent to a chaincode just after it starts up, resulting in the chaincode process crashing. Thus far I have seen only that particular chaincode process (named 'multiSig3') crash, though that's the only chaincode that gets bombarded by queries in my application workflow. And though the traces I've attached below are from a Vagrant setup on my Windows laptop, I have observed this crash occurring intermittently when I run the same chaincode in a Docker container in an Ubuntu VM.

Here's how I observe the crash:

I start 'obc-peer' in 'dev' mode ('openchain.yaml' attached in 'source_config.zip') as follows: $ ./obc-peer peer --peer-chaincodedev
I start the 'multiSig3' chaincode ('multiSig3.go' attached in 'source_config.zip') as follows: $ OPENCHAIN_CHAINCODE_ID_NAME=github.com/openblockchain/obc-peer/openchain/example/chaincode/multisig3 OPENCHAIN_PEER_ADDRESS=0.0.0.0:30303 ./multiSig3
I load a web page in my browser that sends multiple "query" requests concurrently to the chaincode. Sometimes a few queries are processed before a crash, and sometimes the first query itself triggers the crash.

Two sets of crash logs, corresponding to two different instances, are attached below. The chaincode and openchain config file are also attached in 'source_config.zip'.

ConsoleOutput_Set1.zip ConsoleOutput_Set2.zip source_config.zip

corecode commented 8 years ago

@muralisrini I believe this is because a QUERY is racing a previous INIT, and the second QUERY decides to send an INIT again.

https://github.com/hyperledger/fabric/blob/master/core/chaincode/chaincode_support.go#L395

The problem seems to be that handler.isRunning() returns false for created, established, and init states - but LaunchChaincode does not synchronously move the FSM after testing and deciding to launch the chaincode. A racing transaction will run the same test and observe the same "not running" state.

muralisrini commented 8 years ago

@VRamakrishna : " It occurs when one or more "query" requests are sent to a chaincode just after it starts up, resulting in the chaincode process crashing. "

I expected a stack trace somewhere seeing "crash". Instead the chaincode appears to heave exited due to some error in both sets of logs. Will have to investigate the error...

muralisrini commented 8 years ago

@corecode : yes, I think you are right. I see two inits in the logs.

VRamakrishna commented 8 years ago

@muralisrini : I meant "crash" from my perspective, as the chaincode exits unexpectedly without warning. I discussed this bug with Manish before I filed, and he said you'd be able to decipher the console output.

So based on what I'm reading, there is a bug in the FSM? Or is there something wrong in the chaincode itself?

muralisrini commented 8 years ago

@VRamakrishna : its a bug in chaincode. You should not see the problem if you mange it so chaincode is started up with one request before sending a flood of requests. Will that workaround work ?

VRamakrishna commented 8 years ago

@muralisrini : Do you mean I need to send an "init" request before sending other requests? I was not aware that was mandatory; is it documented anywhere? Is it not possible (or recommended) to start a chaincode purely for the purpose of data lookup, where the data was stored on the ledger in a previous session?

muralisrini commented 8 years ago

@VRamakrishna : I meant ensure the chaincode is started with just to 1 query or invoke (so yes, start the chaincode with a data lookup via a query)

VRamakrishna commented 8 years ago

@muralisrini : I don't see any way to enforce that in the application logic, as multiple "users" in different roles could load a web page on their respective browsers and simultaneously query the chaincode. Even for a single user, the web page Javascript sends multiple queries because it has to fill multiple tables in the UI.

The alternative is to manually send a single query (outside of the app logic) to the chaincode immediately after that process restarts. (Is this what you are suggesting?) But this still may not be foolproof, as multiple users may have the web page loaded on their browsers with periodic refresh (in effect, hitting the chaincode with queries) configured, and we may hit the same race condition.

christo4ferris commented 8 years ago

@muralisrini @VRamakrishna this is rather stale, is it still an issue or can it be closed?

VRamakrishna commented 8 years ago

I believe it's still an issue (see my last comment on the thread.) Since a calling application has no way of knowing what state a chaincode's state machine is in, there is a high probability of hitting this race condition. And asking an application to send a "deploy" request prior to a "query" request in a given run would be generally inefficient, and not feasible in instances where initialization will result in changes to ledger state.

hyperledger-archives / fabric

Chaincode crashes when "query" is received just after starting up #979