roshub / vapor_master

high availability ros master
Apache License 2.0
17 stars 8 forks source link

Discussion about MongoDB alternative #40

Closed AlexisTM closed 5 years ago

AlexisTM commented 5 years ago

Hi,

I am looking into vapor as an alternative, but wonder why was it designed with MongoDB. My proposition would be instead Redis because it seems to better fit the purpose of the rosmaster/rosparam servers.

sevenbitbyte commented 5 years ago

@AlexisTM

Great question, the short answer on how we landed on mongo is that vapor was initially developed exclusively for use cases inside of the RosHub cloud and we were already using Mongo extensively making it the most cost effective, least effort and highest performance db given our existing investments. Checkout my talk on vapor for more info on how we use vapor and future work we're thinking about.

Is redis a net benefit?

Redis seems like it may offer some technical improvements for embedded so we'd certainly welcome PR contributions or paid work that helps enable more embedded use cases. The benefits appear marginal at the moment but maybe I'm not understanding your use case.

Tested on pi3

So I'm curious to hear if the DB is specifially causing serious issues in your workflow? We test on Pi3 (sorry no Pizero testing at the moment) and have seen perfectly fine performance there, slightly more cpu usage(~5-10% under very heavy master usage) than we'd like but all-in-all its not impacted the workflows we tested and did not surface issues that should prevent usage on pizero.

Goals of vapor - in-memory is the problem

The question of in-memory goes to the heart of why we needed to build vapor, specifically was to ensure the rosgraph data is not ephemeral. This is a massive challenge in large cloud services using ROS, or embedded ROS devices with multiple compute nodes.

The problem we noticed is that when 1 compute node goes down in an ros1 system the remaining nodes frequently can keep working but if the failed node was the master suddenly the working nodes are stranded without a master. Vapor can be used to solve exactly that by distributing the master service and data across all compute nodes. This gives us the opportunity to both detect the failed node(s) and attempt a custom recovery from failure. We've not finished the full fail-over story but its on the roadmap.

Check out this blog post and my talk on vapor to find out more about our thinking.

AlexisTM commented 5 years ago

I understand perfectly these choices and it makes sense. On my side, the in-memory is a great feature as I want a new fresh master when I start it but that depends on the design choices. I will watch you talk later, thank you for the answer!

sevenbitbyte commented 5 years ago

Should point out the clean-db and no-clean-db variables. They control whether the data in the db should be deleted at start time. Set clean-db to true and vapor will act like an in-memory system.

The other thing is that vapor does do limited detection of nodes that don't exist anymore, it should be enough to restart many systems without evening needing to clean the db. We expect to expand dead-node detection feature, we generally are aiming the master be an always on service

https://github.com/roshub/vapor_master#environment-variables