migalabs / armiarma

Armiarma is a Libp2p open-network crawler with a current focus on Ethereum's CL network
https://monitoreth.io
MIT License
36 stars 13 forks source link

Problems with BoltDB implementation #28

Closed cortze closed 2 years ago

cortze commented 2 years ago

After fixing a race condition when reading a value from the DB (solved here), we realized that the crawler became unstable memory speaking.

The crawler accumulates as much memory from the system as it can, leading it to crash after 6-8 hours of crawl.

The same happens with disk usage. After those 6-8 hours of crawl, the BoltDB file takes up to 15GB of disk, when the content of the DB takes around 4MB.

We have been taking a look at BoltDB and we realized that Prysm has been using a fork of BoltDB, while we were using the original repo. This change still doesn't solve the issues.

After reading how BoltDB works, and some traces on the code we saw that the DB is leaving a large range of free pages in the DB which leads the Disk to grow. As a side effect, looks that a large number of Disk space is leading the memory heap to increase. We should consider compacting the DB every given time as in Prysm, however, our use case is not exactly the same one as the Prysm one.

So far, we have set the default DB to memory to check the stability of the tool. Any suggestions for alternative DBs or ways to fix the problem are welcome.

alrevuelta commented 2 years ago

Interesting, few things.

cortze commented 2 years ago

Yep, let me explain both points a bit more in-depth: