wormhole-foundation / wormhole

A reference implementation for the Wormhole blockchain interoperability protocol.
https://wormhole.com
Other
1.68k stars 691 forks source link

Additional CCQ Logs #4128

Open corpocott opened 1 month ago

corpocott commented 1 month ago

Description

Looking for additional logging to be added to the https call to the ccq proxy to submit the payload. Running into an issue where our submission is not making it to the other guardians and Liston and I are not able to identify the point of failure. Would be helpful to get a debug log added for the https call status and if it is non-200 what the error is. Only seeing this at the current time without any errors

2024-10-01T15:42:32.321Z INFO root.p2p published signed query response {"component": "ccqp2p", "requestSignature": "c025a38804cc2147aa2e445551567bd5c142329823eca8a91db49871a619ee9f1032925fccada70118ad77f08c6139fb911fb11714caa9d68ec610003875e97800", "query_response": {"Request":{"query_request":"xxx","signature":"xxx"},"PerChainResponses":[{"ChainId":2,"Response":{"BlockNumber":20871609,"Hash":"0x31c85dae667b8df8835b2069547959bd91870bd27cfef659f6cbb084a0b94836","Time":"2024-10-01T15:42:23Z","Results":["AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADVdyYXBwZWQgRXRoZXIAAAAAAAAAAAAAAAAAAAAAAAAA"]}}]}, "signature": "xxx"}

Recommendation

Just output the call status and if non-200 the error

evan-gray commented 1 month ago

@corpocott, these responses are published via a libp2p pubsub topic using UDP QUIC v1, not HTTPS. The default port is 8996. libp2p debug messages should be enabled when --logLevel=debug. These are quite chatty but should contain some subscriber and messaging information. In many cases, this comes down to a firewall or networking issue which may not be possible to debug via these logs. I'm not sure there are more logs to provide beyond this.

If you are trying to debug Queries, I would recommend using the ccqlistener tool. First following the command to listen to responses from any guardian, then again to listen to responses from your guardian. If the first succeeds and the second fails, try running the tool inside the same network (or better yet, the same box) as your guardian. If it then succeeds on both, you likely have a firewall or networking issue preventing your responses from egressing. If it fails for your guardian still, you can try overriding the --bootstrap to your guardian address. You can see examples of bootstrap strings here.

corpocott commented 1 month ago

Oh alright, was told it was being submitted over https. Will see if i can get another guardian to run in debug to have them check if they are receiving our heartbeats. It sounds like the p2p gossip is working as expected, but ccq isn't even though they use the same library? Our egress firewall is totally open, so not sure why it isn't getting through. Would think if it was a firewall issue our p2p would be broken too.

corpocott commented 1 month ago

any ideas on why p2p gossip would work as expected but ccq wouldn't? not sure why if they use the same library one would work and the other one wouldn't.

Logs in debug provide a heartbeat that is pretty helpful to know that we are receiving heartbeats from other guardians and they can see ours. Any chance of adding something similar to the ccq code?

lrogana commented 1 month ago

Hey @evan-gray how is ccq traffic different from the gossip p2p traffic?

corpocott commented 1 month ago

to add some context we swapped the p2p gossip and ccq ports to use the heartbeats to debug ingress/egress on that specific port. p2p gossip seemed to work as expected using 8996. We were receiving heartbeats and the dashboard was able to see our node's heartbeats. So if both protocols use the same library to communicate wondering how p2p gossip works when using 8996 but not ccq. Any differences would help me further dig into the problem on our side.

evan-gray commented 1 month ago

Agree that sounds like a good test. Both protocols do use the same library. As far as I understand, p2p and ccq_p2p use the exact same parameters except for the port and peers.

https://github.com/wormhole-foundation/wormhole/blob/a543c4045ab8a713765420836e8eed575762bf93/node/pkg/p2p/p2p.go#L468-L469

ccq_p2p uses the same NewHost method.

https://github.com/wormhole-foundation/wormhole/blob/a543c4045ab8a713765420836e8eed575762bf93/node/pkg/p2p/ccq_p2p.go#L93

One thing it does differently is to only allow peer connections from the allowed peers and the guardians, but if you are seeing inbound requests from those peers, I don't have a reason to believe anything here would restrict outbound requests.

https://github.com/wormhole-foundation/wormhole/blob/a543c4045ab8a713765420836e8eed575762bf93/node/pkg/p2p/ccq_p2p.go#L115

corpocott commented 1 month ago

anything else come to mind that I should be checking? Any way to tell on the ccq proxy if our responses are being rejected for some reason? Here is a stripped down example of our logs:

root received a query requestroot forwarded query request to watcher 
root.solana-finalized_watch CONCURRENT: processing query request
root.solana-finalized_watch received a sol_account query 
root.solana-finalized_watch CONCURRENT: finished processing query request 
root.solana-finalized_watch minimum context slot has not been reached, will retry shortly
root.solana-finalized_watch initiating fast retry 
root.solana-finalized_watch published query response to handler
root received final per chain query response, ready to publish
root forwarded query response to p2p 
root.p2p published signed query response

It appears as though we are publishing without any errors.