hackbg / undexer

🎉 The Undexer 🎉 Namada network indexer powering https://shielded.live/
4 stars 3 forks source link

Another attempt #9

Open opsecx opened 1 week ago

opsecx commented 1 week ago

Daniel encouraged me to keep posting my issues. I note I haven't had much in way of reply for a month and tbh it's dragging me a bit.

I'm still trying to get undexer running properly under a fairly standard setup:

when running newest branch against housefire-head, it starts well, syncs up to block approx 27k (there are approx 237k blocks atm), then crashes.

When I query the db through undexer cli it has all relevant information it seems up to the approx 27k block.

When I restart the indexer however, it does what it has done for me all along, namely wipe the db and start from scratch.

So there are two issues:

Would be great if these issues could be fixed.

egasimus commented 1 week ago

Hey, just seeing this now. Thank you for staying in touch!

The crashes are known to us and seem to be due to some intermittent race condition within the node, which we have no control over; we paper over those by restarting the indexer when it crashes. I agree this might not be ideal, but in practice it seems to work well enough.

Unfortunately we haven't been able to reproduce your main problem, where it wipes the database. Currently we're working on a setup with a self-hosted full node (turns out some data is erased after a few epochs, so the only way to fetch it retroactively is while a fresh node is syncing... yeah go figure :grimacing: https://github.com/anoma/namada/issues/3810).

That has made it necessary to rework some of the indexing logic, plus - hopefully relevant to your case - updating the Docker Compose configuration. Maybe that'll be where we catch the table wipe happening; and if we don't, I wouldn't have much else to advise you besides giving the Docker Compose setup a shot, once the next version is out (this week or worst case early next.)

egasimus commented 1 week ago

Hmm, something just crossed my mind: what happens if you keep the empty tables, but comment out this db.sync call[^0]? Does it still delete stuff?

I always had this nagging feeling that the Sequelize sync method actually doesn't always work as expected. Maybe that's what's happening differently for you? Since we've relied on reindexing from scratch a whole lot, instead of doing proper migrations (sorry, node operators!), we haven't really had the opportunity to get to the bottom of that - should've just used slonik or something... :grin:

[^0]: Edited 2024-09-24: Update link to permalink.

opsecx commented 1 day ago

I'm feeling it would be great if you could try and replicate a similar setup to mine, it's a fairly standard config (postgres 14, single database given in connection url), and see if you get the same errors in functionality. it's a little hard for me to debug from here not being into the inner workings of the program source.

egasimus commented 11 hours ago

That's exactly why we provide docker-compose.yml: to have an actually standard setup (not simply a fairly standard one)—which would allow you to treat the application as a black box without needing to peer into its inner workings.

Anything else is entirely in your hands.

We provide Undexer to the community free of charge. One of the implicit contracts of open source software is that users are able to contribute improvements, and lend a hand in solving problems. It is actively harmful to understand this as an invitation to request free labor, and I'm sure by now you must have heard many people speaking out about how this endangers the health of the ecosystem.

Still, if you would be so kind as to provide a virtual machine image in an open format, containing the system in which the bug can be replicated, it would make it possible for us to look into your problem in situ—sometime inbetween making Undexer index vote power correctly (massively complicated by how the Namada node prunes data) and the rest of the improvements requested by the funder of this project.

Alternatively, I've already provided an exact pointer to a single line in the source code—which you can comment out after initial DB sync, to see if this is what deletes the database on crash. (Since we provide no pre-packaged build artifact, I'm allowing myself to assume here that you're running from source?)

And the inner workings of sequelize (which does our DB setup, and where I suspect the root cause of your problem to be) are as much a mystery to us as they are to you, which is why I talk of wanting to replace it. :sweat_smile: