Closed arjaz closed 1 month ago
I was able to replicate this error by starting Ogmios using an invalid (inexistent) socket file as argument. Perhaps that's what's causing your issue as well.
Either way, this tells me a more sophisticated error handling logic is needed in Xogmios. I'll look into this as soon as I have some free time. Thanks for reporting!
@arjaz Mind testing the following branch when you get a chance ?
defp deps do
[
#...
{:xogmios, github: "wowica/xogmios", ref: "5f7c7cc"},
]
end
This should address the issue with the scenario I described. Other than that I'm not familiar with Ogmios having a "sync" period. Let me know if this is still an issue.
What I was referring to with Ogmios syncing is that it has a period where it pushes new blocks or something:
[2024-09-19 09:14:52.17 UTC] Pushing ledger state for block 12421234b585d69736437d525d5245c6358ba6be364f30c9c5352fd64fa9857e at slot 70913787. Progress: 16.82%
The 5f7c7cc commit calls the handle_disconnect callback which is nice, but it seems it doesn't handle the :reconnect option gracefully 🤔 With the following callback:
@impl true
def handle_disconnect(_reason, state) do
Logger.warning("Ogmios connection died, reconnecting in 5s")
{:reconnect, 5_000, state}
end
I get this log
[warning] Ogmios connection died, reconnecting in 5s
But no actual blocks after that nor any following handle_connect calls
Oh I see. This log message is coming from the Cardano node itself while it syncs with other nodes in the network.
While Ogmios is able to start and receive connections from clients, I don't believe it is in usable state until the underlying Cardano node is fully synced. The node, for example, once started takes a while to create its socket file and accept connections. This is likely the reason why Xogmios cannot fully establish the connection to Ogmios.
That being said, I'll look into improving the reconnection logic to keep trying to reconnect in this particular scenario until the node is fully synced and a socket file is available for Ogmios. Just know that it might take a long time, though, depending on how the node is configured... like many hours.
💡 Found some useful information reading through the TypeScript client that relates to this exact scenario you brought up.
This is great. Should be able to use Ogmios' /health
endpoint as part of Xogmios' internal connection process to check whether it can start sending messages.
@arjaz Pushed a few more changes. It should now display a more informative warning message and then keep trying to reconnect indefinitely until Ogmios is ready to respond with data.
Mind trying it again ?
{:xogmios, github: "wowica/xogmios", ref: "da31c26"},
Update: Actually, looking back at the TS client I'm realizing this fix might not yet fully address your use case. I'm adding a few more checks.
Should include much better error/warning messages now:
{:xogmios, github: "wowica/xogmios", ref: "efedfd1"}
Yes, I can confirm it. Thank you so much for your work!
The problem manifests when you try to establish a connection with Ogmios when it's not fully synced (in my case when you start the application and Ogmios simultaneously). The connection seems to be established because I get logs from the
handle_connect/1
callback, but when the client tries to send frames to Ogmios it fails with the following error:The connection doesn't get restored after that, but it doesn't get disconnected either as the
handle_disconnect/2
callback isn't called. Here are the relevant parts of the client I have: