migalabs / armiarma

Armiarma is a Libp2p open-network crawler with a current focus on Ethereum's CL network
https://monitoreth.io
MIT License
37 stars 13 forks source link

Add an interface that saves the peerstore to a db or to memory #18

Closed kkhalil7 closed 2 years ago

kkhalil7 commented 3 years ago

After trying a lot of architectures for the interfaces, I found this one to be the most "elegant" and functional. There is a BoltDB struct that deals with the low level of Bolt, and can be reused for storing other parts than the peerstore. And there is two higher level structs that implements the PeerStoreStorage interface, one that deals with a DB and one with memory that just deals with storing and retrieving peers into a peerstore. I added the option to choose which storage medium we want when creating a new PeerStore with the function newPeerStore(), with the idea of having these arguments passed later on as cli flags. I don't know if you have a more efficient way to accomplish this functionality, which I assume is nice to have, but this is the easiest way I found to get it to work.

cortze commented 3 years ago

Regarding implementing the DB for the existing PeerMetrics, I think that it offers an interesting upgrade that will make the metrics a bit more organized and maintainable (right now, I'm just taking snapshots of the metrics and the peerstore every 12h).

At the moment, we have discovered around 25k Peers in the network, and the current optimization level of the memory keeps it under 3GB. image

About the timing to introduce it as a feature in the crawler, I have few concerns:

  1. First of all, it would imply checking the read/write speeds of the database. This way, we could ensure that updating information regarding a specific peer wouldn't slow down or even freeze the peer info update. Alternatively, a Peer cache could be implemented to complement the DB (which would mean a bit more work but is an optimal design).

  2. Related to the first one, I have seen few comparisons between BoltDB and Badger, and it might be interesting to check the reading/writing ratio that we are expecting to choose wisely the most appropriated implementation (comparison post, thanks @kkhalil7 for the source)

  3. We have to keep in mind the difference between the deeper analysis we can make with the analyzer python script and the light analysis we are currently exposing in the websites' dashboard. This script can be updated/upgraded with more features, but the DB shouldn't make the analysis too complex (from a data reading perspective).

Taking this into account, I would say that the workload level that we are experiencing should determine whether to do it now or later. Perhaps @kkhalil7 could be interested in keep working on it.

@alrevuelta How about linking this PR with the PeerMetrics refactoring one? Maybe it would be interesting to choose whether to keep the metrics in Memory or a given DB, just an idea, willing to know your point of view on the topic.

cortze commented 3 years ago

As commented in the description of the PR:

It is using the peerID as a key, and the Peer struct marshaled into JSON as a value

The idea of using the DB would be deprecating the actual sync.Map that keeps all the PeerStore in Memory. I would suggest replacing that sync.Map for DB interface described as:

BoltDB interface {
   ReadPeer()
   WritePeer()
   LoadOrWritePeer() ¿?
   ListPeers()
   DeletePeer() 
}

So that the PeerStore would look like:

PeerStore {
   PeerDB -> (Bolt DB)
   PeerCount
   ConnectedPeers
   StartTime
   etc, etc
}

This could also be moved to a new storage?¿.go and had its own test units.

What do you guys @kkhalil7 @alrevuelta think about it, make sense?

kkhalil7 commented 3 years ago

Maybe we should implement first the same functions a sync.Map implements, so we don't have to change every instance of peerStore ? These functions would be:

func (db *BoltDB) Delete(key interface{})
func (db *BoltDB) Load(key interface{}) (value interface{}, ok bool)
func (db *BoltDB) LoadAndDelete(key interface{}) (value interface{}, loaded bool)
func (db *BoltDB) LoadOrStore(key, value interface{}) (actual interface{}, loaded bool)
func (db *BoltDB) Range(f func(key, value interface{}) bool)
func (db *BoltDB) Store(key, value interface{})

And of course with the possibility of adding others

cortze commented 3 years ago

Makes sense to me

alrevuelta commented 3 years ago

@alrevuelta How about linking this PR with the PeerMetrics refactoring one? Maybe it would be interesting to choose whether to keep the metrics in Memory or a given DB, just an idea, willing to know your point of view on the topic.

Maybe we can define a common interface for both the db and memory, so that they expose the same interface but implemented in a different way. Not sure why we would need both but its perhaps nice to have.

The idea of using the DB would be deprecating the actual sync.Map that keeps all the PeerStore in Memory. I would suggest replacing that sync.Map for DB interface described as:

BoltDB interface { ReadPeer() WritePeer() LoadOrWritePeer() ¿? ListPeers() DeletePeer() }

I really like this, and as @kkhalil7 suggest we can make it similar to what sync.Map offers. So that we have an interface with ReadPeer, WritePeer, DeletePeer that its implemented in two different ways. Using bolt db or using sync.map primitives.

cortze commented 2 years ago

moved PR to #25