hyperledger / fabric

Hyperledger Fabric is an enterprise-grade permissioned distributed ledger framework for developing solutions and applications. Its modular and versatile design satisfies a broad range of industry use cases. It offers a unique approach to consensus that enables performance at scale while preserving privacy.
https://wiki.hyperledger.org/display/fabric
Apache License 2.0
15.55k stars 8.79k forks source link

Add channel config for max doc size behind a new application capability to prevent document_too_large issues in CouchDB #3306

Open jexus6 opened 2 years ago

jexus6 commented 2 years ago

On a peer init ( that was working before) I get this error:

2022-03-28 11:30:22.423 UTC [nodeCmd] serve -> INFO 015 Deployed system chaincodes
2022-03-28 11:30:22.423 UTC [peer] Initialize -> INFO 016 Loading chain mychannel
2022-03-28 11:30:22.423 UTC [ledgermgmt] OpenLedger -> INFO 017 Opening ledger with id = mychannel
2022-03-28 11:30:22.561 UTC [kvledger] recommitLostBlocks -> INFO 018 Recommitting lost blocks - firstBlockNum=151932, lastBlockNum=151932, recoverables=[]kvledger.recoverable{(*txmgr.LockBasedTxMgr)(0xc0007ac280), (*history.DB)(0xc0007019e0)}
2022-03-28 11:30:24.191 UTC [statecouchdb] commitUpdates -> WARN 019 CouchDB batch document delete encountered an problem. Retrying delete for document ID:balance621e8f504846af00263781f48445288001003000000000000000001673753
2022-03-28 11:30:24.201 UTC [statecouchdb] commitUpdates -> WARN 01a CouchDB batch document delete encountered an problem. Retrying delete for document ID:balance621e8f532d450700269506cf8445288001003000000000000000001674145
2022-03-28 11:30:24.213 UTC [statecouchdb] commitUpdates -> WARN 01b CouchDB batch document delete encountered an problem. Retrying delete for document ID:balance621e8f4e3bfafd00269df9368445288001003000000000000000001673586
2022-03-28 11:30:24.218 UTC [statecouchdb] commitUpdates -> WARN 01c CouchDB batch document delete encountered an problem. Retrying delete for document ID:balance621e8f4a4b49c90026d42acd8445288001003000000000000000001673194
2022-03-28 11:30:24.224 UTC [statecouchdb] commitUpdates -> WARN 01d CouchDB batch document delete encountered an problem. Retrying delete for document ID:balance621e8f528876a0002650ab448445288001003000000000000000001673975
2022-03-28 11:30:24.230 UTC [statecouchdb] commitUpdates -> WARN 01e CouchDB batch document delete encountered an problem. Retrying delete for document ID:balance621e8f54c1c7d3002637c8f68445288001003000000000000000001674174
2022-03-28 11:30:24.239 UTC [statecouchdb] commitUpdates -> WARN 01f CouchDB batch document delete encountered an problem. Retrying delete for document ID:balance621e8f4c8876a0002650ab1c8445288001003000000000000000001673464
2022-03-28 11:30:24.255 UTC [statecouchdb] commitUpdates -> WARN 020 CouchDB batch document delete encountered an problem. Retrying delete for document ID:balance621e8f4c4b49c90026d42ad18445288001003000000000000000001673419
2022-03-28 11:30:24.741 UTC [peer] Initialize -> ERRO 021 Failed to load ledger mychannel (error handling CouchDB request. Error:document_too_large,  Status Code:413,  Reason:lot621f3016cec5a038d5b7cdf7
github.com/hyperledger/fabric/core/ledger/kvledger/txmgmt/statedb/statecouchdb.(*couchInstance).handleRequest

And there is no way for this peer to join again to that channel.

davidkel commented 2 years ago

see https://stackoverflow.com/questions/71648747/peer-cant-load-ledger-on-init-due-couchdb-document-too-large

TsvetanG commented 1 year ago

It is because of couchDB max_document_size https://docs.couchdb.org/en/stable/whatsnew/3.0.html To bring your channel back to life try to adjust the couchDB max_document_size Not sure why this is not handled properly by the peer. It shouldn't be possible to bring the channel down by just pushing a transaction with a data volume higher than what couchDB can accept. For example to reproduce you can just push a > 8MB doc to a PDC in one trx and you get your peer(s) down on that channel. I think the peer just has to reject the transaction endorsement instead of reaching a point of crashing to process the channel. It is also a good idea to document that so we do better design of high-data volume applications. @denyeart , @yacovm ... Hope you may help with that. Would be great to get some insights from the maintainers.

denyeart commented 1 year ago

@TsvetanG It can't be checked at endorsement time because the endorsing peer has no way of knowing the configuration of max_document_size in every other peer's CouchDB.

You can resolve the problem on the peer by increasing CouchDB max_document_size and then restarting peer. Upon restart the block's state database updates will be re-attempted and will succeed given the higher max_document_size.

Unfortunately the default for max_document_size changed from 4GB to 8MB in CouchDB 3.0. This was mentioned in the Fabric release notes when CouchDB 3.0 support was added, but I agree it should go in Fabric docs as well, I've opened PR https://github.com/hyperledger/fabric/pull/4095

See doc update: https://hyperledger-fabric.readthedocs.io/en/latest/couchdb_as_state_database.html#couchdb-container-configuration

TsvetanG commented 1 year ago

@denyeart : Thanks a lot for the update. Yes, it is unknown to a peer what is the individual coudchDB limit for the rest of the network. We can design an application to respect certain volume limitations (and that is normal for any software design).

What about if there is a setting (kind of consensus) on the network level for the max doc size? That way we can prevent a bad actor to send a transaction exceeding the limit and therefore breaking the peers of other orgs on the channel. I understand it is about proper configuration. There should be a way though to prevent a bad (or hacked) client from breaking the channel.

denyeart commented 1 year ago

@TsvetanG I would be in favor of a channel configuration for max doc size to enforce this upon transaction validation, and also check it during endorsement. To ensure deterministic behavior across peers we would need to enforce it with a new channel application capability, for example it could be enforced starting with V2_5 application capability.

We would need to decide whether it is enforced only for JSON or for any writes (the CouchDB max_document_size applies only to JSON, other data is written as a CouchDB attachment of unlimited size).

While considering this, we should also consider a channel configuration for state database. The docs say not to mix a channel with LevelDB and CouchDB state databases as there are some behavior inconsistencies, but this is not enforced at channel level.

What do you think @manish-sethi ?

manish-sethi commented 1 year ago

Enforcing max doc size behind a new channel capability makes sense. However, the db type setting is more broader settings than the channel level. Ideally, its at the network level. For instance, in the current Fabric design, we cannot allow one channel to specify leveldb and another to specify couchdb as a fabric peer maintains all the channels' state into a single physical database.

denyeart commented 1 year ago

Ok, I've updated the issue title to reflect that this issue can be used to add a max doc size channel configuration behind a future application capability. This would prevent the issue of trying to save docs larger than the CouchDB max_document_size configuration.

v2.5 has been released, it could be targeted for v3.0 release in main branch (V3_0 capability) if somebody would like to contribute a pull request.

denyeart commented 1 month ago

We discussed in Fabric contributor meeting today. Current thinking is that the check can be done at chaincode execution (endorsement) time rather than at validation time so that a new capability would not be needed.

The following new application property in the channel configuration would be enforced at chaincode execution (endorsement) time:

MaxWriteSize - The maximum bytes for a single key's value in a transaction write set. This roughly corresponds to CouchDB's max_document_size property, which got lowered to a default of 8MB in CouchDB v3. If users are worried about member organizations with a default CouchDB configuration then they could set MaxWriteSize to 8MB. The MaxWriteSize would be enforced for both regular channel data and private data.

If unset in the channel configuration there would be no limit enforced for MaxWriteSize. However, note that there is already an Orderer property "BatchSize.AbsoluteMaxBytes" in the channel configuration to ensure that no single transaction (including all writes in the transaction) is too large, for example to ensure a transaction doesn't consume an inordinate amount of space on peer ledgers (block storage and state database) and to ensure a block doesn't exceed configured gRPC or other network size limits.