Loopd becomes unresponsive after some time

lightninglabs / loop

Lightning Loop: A Non-Custodial Off/On Chain Bridge

MIT License

546 stars 115 forks source link

Loopd becomes unresponsive after some time #767

Open Jossec101 opened 1 month ago

Jossec101 commented 1 month ago

Hey!

After some time (we have not measured how much yet), our loop is having issues doing loop outs, getinfo breaks as in the log below:

lnd:/# loop getinfo
[loop] rpc error: code = Unknown desc = sql: Scan error on column index 20, name "publication_deadline": unsupported Scan, storing driver.Value type string into type *time.Time

Additionally, invoking loop out gives this error:

cltv delta below minimum: 98

It somehow resolves if we kill loopd and some swaps got broadcast onchain

System information

Versions: LND 0.17.5 and loop 0.27.1 for amd64 linux containers on kubernetes (AWS)

hieblmi commented 1 month ago

Thanks for raising the issue @Jossec101. Have you been running this setup for a while or did this behavior recently just appear or did you upgrade any versions of your stack?

Jossec101 commented 1 month ago

Thanks for raising the issue @Jossec101. Have you been running this setup for a while or did this behavior recently just appear or did you upgrade any versions of your stack?

Nope, we have running loopd for more than 1 year upgrading from time to time to newer versions, recently we just started to see this behaviour.

bhandras commented 1 month ago

Are you using sqlite or PostgreSQL? If PostgreSQL, could you please provide the version that your interfacing with?

Jossec101 commented 1 month ago

Are you using sqlite or PostgreSQL? If PostgreSQL, could you please provide the version that your interfacing with?

SQLite

Jossec101 commented 1 month ago

Wiping up the db fixed this issue, it looks to be a corruption over time

bhandras commented 1 month ago

Thanks for the heads-up @Jossec101. If I understand correctly you're referring to filesystem corruption? Did you identify the source of the corruption? Or do you maybe mean that the DB file itself was changed outside of loopd?

Jossec101 commented 1 month ago

Thanks for the heads-up @Jossec101. If I understand correctly you're referring to filesystem corruption? Did you identify the source of the corruption? Or do you maybe mean that the DB file itself was changed outside of loopd?

I mean that we stopped loopd, moved the sqlite to a safe place and loopd recreated a new one and that solved the issue.

Jossec101 commented 4 weeks ago

It happened again today, bump

bhandras commented 4 weeks ago

Did you get the same error as reported in https://github.com/lightninglabs/loop/issues/767#issue-2330500692 ?

Jossec101 commented 4 weeks ago

Yes, and and the db actually corrupted after a few hours since last comment but we upgraded to loop 0.28, I will report back if it happens again

Jossec101 commented 3 weeks ago

It has happened again, actually the frequency between db corruptions is increasing 🤔

bhandras commented 3 weeks ago

I believe it may not be a DB corruption but perhaps a Go sqlite bug as it seemingly fails to parse the timestamp value. Do you still have your DB? Perhaps could you do a select and see what publication deadlines are there? Or if you could share the DB with me, I'm happy to take a look. I'm on both Keybase and Slack.