hyperledger / aries-cloudagent-python

Hyperledger Aries Cloud Agent Python (ACA-Py) is a foundation for building decentralized identity applications and services running in non-mobile environments.
https://wiki.hyperledger.org/display/aries
Apache License 2.0
404 stars 511 forks source link

Error: "Revocation registry is full" when bulk running bulk issuances. #1684

Closed konda-kalyan closed 2 years ago

konda-kalyan commented 2 years ago

Hi, looking for suggestions to resolve the below issue. I am running bulk number of issuances (Let's say 50K). After 49128 issuance are successful, I started seeing the error on agent side. I am using same cred definition to issue the 50K credentials to ten different holders. I am using BCovrin Test network for these tests. Trying to understand, is it something related to ledger or tail server? How to resolve this error?

Revocation registry Y9wFFsnqyVV4pvCyaFmASp:4:Y9wFFsnqyVV4pvCyaFmASp:3:CL:184799:default:CL_ACCUM:c3763215-8297-46e3-a06d-972e2fcad3ae is full: cannot create credential _indy_loop_callback: Function returned error.

Please note that this issue is posted in Discord - aca-py channel as well.

PaulWen commented 2 years ago

What is the revocation registry size you are using?

swcurran commented 2 years ago

It looks like it might be a race condition in creating revocation registries, so the size of the registries and the speed of issuance might be factors -- good to know what those parameters are for this test.

The design of AnonCreds revocation is that a revocation registry is created with X slots for credentials, and each issuance uses up one slot until the registry is full. At that point, a new revocation registry is needed. ACA-Py layers on top of that a "build ahead" model. Rather than waiting until one RevReg is full before creating the next, ACA-Py starts by building 2 RevRegs, when the first fills up, the second one is activated and a new, third one is built. This pattern repeats every time a RevReg is full.

This error suggests there might be a race condition somewhere in filling up one RevReg and before a new RevReg can be created and made available. That said, I know that @andrewwhitehead has run tests to try to create this problem and has been unable to do so -- e.g. making the RevReg very small and increasing the issuance rate.

Worse, from what I think you are saying above is that ACA-Py seems to be able to get into a state where it can't recover -- e.g. once the error happens, it prevents all further issuances. Is that true @konda-kalyan ?

The load test generator that @PaulWen created is likely the best way to try to make this test repeatable. Perhaps a good metric to produce from the generator is the rate of RevReg creations.

konda-kalyan commented 2 years ago

Thanks @PaulWen and @swcurran for the response.

@PaulWen: I didn't explicitly mentioned revocation registry size and hence it should be the default size which is 1000.

@swcurran: Even after registry full errors, the issuances are still happening. Issuances are not stopped. In that way, it is good. It is able to proceed.

Additional info about setup: I am feeding at 3.5 requests per second rate to the controller. My setup is one controller (with mysql as offchain db) and one issuer agent (with postgres wallet). All components are running on AWS EKS cluster (on one high configuration node (with 32 cores)). Also, note that, when sending offers to agent, in controller, I have enabled 'auto_issue' flag, so issuances are happening automatically once offer is accepted by holder. 10 holders are running as AWS - ECS containers.

Just as additional info; One other logic I didn't get is that the above error for each id, I see the error around 5 to 15 times (not clearly noticed, but rough figure) for couple of ids. One of the id is in above error '3763215-8297-46e3-a06d-972e2fcad3ae'. I don't know what this id represents. I see errors for multiple ids like this.

By the way, any work around to avoid this race condition?

konda-kalyan commented 2 years ago

just to add, I see the recommended revocation registry size is 1000. Is there any other recommendations for prod where is issuances are more and TPS is really matters?

swcurran commented 2 years ago

Are you able to repeat the run easily? If so, it might be nice to try it with different RevReg sizes to see if that affects how the problem occurs. Say, 500 and 1500. That said, I think Andrew has tried that without affect.

konda-kalyan commented 2 years ago

I run 50K run 2 to 3 times and every time I have used same revReg size. I see some errors (roughly 300 to 700) on every run. I will ty with different sizes and see.

PaulWen commented 2 years ago

Using the Load Generator I managed to generate various different exceptions issuing revocable credentials.

The most common issue is "Revocation registry metadata not found". But this did not yet crash the AcaPy.

I also played around with the revocation registry size and was able to see the AcaPy crash (unrecoverable) with a small RevReg size of 10 as well as a larger one of 3k: https://github.com/lissi-id/acapy-load-test-results#any-revreg-size-may-or-may-not-cause-issues

What I did not have seen so far is your error message "Revocation registry is full".

Are you using the Askar or Indy wallet type? Most of the tests that I am running are using the Askar wallet.

konda-kalyan commented 2 years ago

I am using Indy wallet

swcurran commented 2 years ago

Based on the latest CloudAgent Load Test Generator runs, I'm going to close this issue.

Please reopen if this is still an issue, and ideally use the Load Generator to create a reproducible instance of this issue.

konda-kalyan commented 1 year ago

I am using Indy wallet

On Thu, 24 Mar 2022, 11:28 pm Paul Wenzel, @.***> wrote:

Using the Load Generator I managed to generate various different exceptions issuing revocable credentials.

The most common issue is "Revocation registry metadata not found" https://github.com/lissi-id/acapy-load-test-results#credential-issuance-fails-due-to-revocation-registry-metadata-not-found. But his did not yet crash the AcaPy.

I also played around with the revocation registry size and was able to see the AcaPy crash (unrecoverable) with a small RevReg size of 10 as well as a larger one of 3k: https://github.com/lissi-id/acapy-load-test-results#any-revreg-size-may-or-may-not-cause-issues

What I did not have seen so far is your error message "Revocation registry is full".

Are you using the Askar or Indy wallet type? Most of the tests that I am running are using the Askar wallet.

— Reply to this email directly, view it on GitHub https://github.com/hyperledger/aries-cloudagent-python/issues/1684#issuecomment-1077892934, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJIN5GEEBKT3BR6LVV3BWITVBSUNJANCNFSM5RPUVKQA . You are receiving this because you were mentioned.Message ID: @.***>