Fix token decimal crash

whilefoo commented 3 weeks ago

Resolves https://github.com/ubiquity-os-marketplace/text-conversation-rewards/issues/148

The problem was that instead of using the provider returned by the rpc-handler which has a Proxy that retries with different RPCs, we initialized a standard JsonRpcProvider with url from the returned provider. I've also increased retry count to 5 to make it more reliable.

0x4007 commented 3 weeks ago

Is there a way to QA and prove it works

gentlementlegen commented 3 weeks ago

Maybe we could have a hardcoded test with the gnosis endpoint that is often failing so we are sure it doesn't happen again.

0x4007 commented 3 weeks ago

Maybe we could have a hardcoded test with the gnosis endpoint that is often failing so we are sure it doesn't happen again.

Seems like the wrong approach to do anything special for endpoints because all endpoints are transient. The core logic should handle these problems on the fly.

0x4007 commented 3 weeks ago

Decided to merge because 5 retries seems like it is 5x more likely to work

gentlementlegen commented 3 weeks ago

After the fix, it seems the error is still there and now it introduced many other crashes:

@whilefoo Am I misusing it?

whilefoo commented 3 weeks ago

Are you sure it's using the latest permit-generation package? I think we need to deploy a new version

The second error seems to be a problem with rpc-handler

gentlementlegen commented 3 weeks ago

@whilefoo Yes I supposedly deployed and I am using the latest version. I added some debug logs and I had the following details:

    Error: Failed to get token decimals for token: 0xe91D153E0b41518A2Ce8Dd3D7944Fa863463a97d, Error: missing revert data in call exception; Transaction reverted without a reason string [ See: https://links.ethers.org/v5-errors-CALL_EXCEPTION ] (data="0x", transaction={"to":"0xe91D153E0b41518A2Ce8Dd3D7944Fa863463a97d","data":"0x313ce567","accessList":null}, error={"reason":"bad response","code":"SERVER_ERROR","status":400,"headers":{"date":"Thu, 10 Oct 2024 10:15:07 GMT","content-type":"application/json","transfer-encoding":"chunked","connection":"close","access-control-allow-origin":"*","vary":"Accept-Encoding","x-drpc-owner-tier":"free","x-drpc-trace-id":"bcdca784fd99e7ec0138ef5a91ffb9dd","strict-transport-security":"max-age=15724800; includeSubDomains","access-control-allow-credentials":"true","access-control-allow-methods":"GET, PUT, POST, DELETE, PATCH, OPTIONS","access-control-allow-headers":"DNT,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization","access-control-max-age":"1728000","cf-cache-status":"DYNAMIC","server":"cloudflare","cf-ray":"8d05d8f048398b5c-ICN"},"body":"{\"id\":43,\"jsonrpc\":\"2.0\",\"error\":{\"message\":\"Can't route your request to suitable provider, if you specified certain providers revise the list\",\"code\":12}}","requestBody":"{\"method\":\"eth_call\",\"params\":[{\"to\":\"0xe91d153e0b41518a2ce8dd3d7944fa863463a97d\",\"data\":\"0x313ce567\"},\"latest\"],\"id\":43,\"jsonrpc\":\"2.0\"}","requestMethod":"POST","url":"https://gnosis.drpc.org"}, code=CALL_EXCEPTION, version=providers/5.7.2)

also

    Error: Failed to get token decimals for token: 0xe91D153E0b41518A2Ce8Dd3D7944Fa863463a97d, Error: could not detect network (event="noNetwork", code=NETWORK_ERROR, version=providers/5.7.2)

Even with 5 retries, I had 3 errors

Failed to reach endpoint https://xdai-archive.blockscout.com. Request failed with status code 404
Failed to reach endpoint https://web3endpoints.com/gnosischain-mainnet. timeout

and succeeded sometimes on 5th try, with networks like https://gnosis.blockpi.network/v1/rpc/public that successfully handled the request. It seems very unreliable somehow. Throwing this here but would v6 improve this?

Edit: this problem also seems to affect pay.ubq.fi where I only get errors:

whilefoo commented 3 weeks ago

it seems that most errors are from https://gnosis.drpc.org but also other RPCs. We could make the rpc-handler to try every RPC available and then hopefully at least one RPC is working.

I think the rpc-handler already sends a block number RPC request to every RPC so it should already eliminate non-working RPCs unless the problem is with our request but seems unlikely because other endpoints work

gentlementlegen commented 3 weeks ago

Maybe we should have some whitelist / blacklist for RPCs but all of a sudden it seems much more frequent, which is why I wondered if any changes was made either to our packages or to the endpoints themselves. The version we are using is already 2 years old and not maintained which is why I suggested that could have been a reason. https://github.com/ethers-io/ethers.js/releases/tag/v5.7.2

whilefoo commented 3 weeks ago

The version we are using is already 2 years old and not maintained which is why I suggested that could have been a reason. https://github.com/ethers-io/ethers.js/releases/tag/v5.7.2

I guess it's worth trying, anyway we should have latest version of packages

0x4007 commented 3 weeks ago

We are not using the latest ethers because LLMs don't have it yet, and v6 APIs are so different.

It doesn't make any sense to me that every RPC provider decided to change their APIs suddenly, so I don't think our client software (ethers) is a relevant problem.

gentlementlegen commented 3 weeks ago

Most of the failures seem to be on invalid answers from the RPC that cannot be understood by ether which then throws, hence my guess. I understand the LLM problem but if less and less RPCs can be used it's gonna be a problem sooner or later.

whilefoo commented 3 weeks ago

I've just tried it on v6 and after 20 successful requests I got one with error (missing revert data)

gentlementlegen commented 3 weeks ago

@whilefoo After investigating a lot I noticed that it actually fails in the tests due to the presence of mswjs intercepting network calls. It was never an issue before so I don't know what changed. But basically whenever I start the server listener the rpc-handler seems incapable to reach properly most endpoints.

ubiquity-os / permit-generation

Fix token decimal crash #82