Refactor to a distributed kv backend?

jh125486 commented 6 years ago

Has anyone looked at libkv? It just abstracts a key value store, be it local (BoltDB) or distributed (Consul, Etcd, Zookeeper).

While not a full solution, it might be a good enough stopgap between maintaining one monolithic binary and distributing the kv store. (libkv is licensed Apache 2.0)

nilslice commented 6 years ago

I haven't, but will give it a look. How would you foresee this fitting in to Ponzu?

jh125486 commented 6 years ago

I can take a look this weekend into refactoring the Bolt reads/writes to libkv instead. Going forward I think that would alleviate a lot of the concern around a single point of failure for the API (DB node going offline).

nilslice commented 6 years ago

That would be a huge improvement -- especially if it can be opted into. I see a lot of use cases where a micro-service approach could use a more available data layer, but many Ponzu instances I've created or have been shown are pretty simple, single node applications and I'd like to continue building for that kind of user. Ideally we can support both, but the goal is to make Ponzu simple and usable for Go programmers of all levels.

Thank you for exploring this -- I really like the idea.

ghost commented 6 years ago

Part of the reason for using boltdb is because the freetext search system called bleve that is part of Ponzi needs boltdb.

But what about if we split the storage db from the indexing DB ! I think that will give a much better architecture:

pipelined so that write to each DB are eventually consistent
opens up the ability to store mutations and rebuilt from those.
use badger / dgraph to get multi master HA, and bolt DB for indexing. Because the first part of the pipeline would hit the storage layer of dgraph. / Badger that spreads and replicated the 2nd part of the pipeline that being the indexing would automatically get replicated. Pretty powerful solution.

I think Ponzi add a small layer to do the pipelining without going overboard with a control plane like NATS. This is because Ponzi has a central core where everything passes through and so it can do the pipelining into the 2 databases.

This also opens the door to adding other specialised databases and of course exposing them through the Ponzi API. For example there is an amazingly fast indexer for structured data call Pilosa that is written in golang and can be used by the code generator and the API. Pilosa can do very very fast queries on data.

In the end I guess I am discussing a semi CQRS style architecture where the master master database of dgraph is your primary storage layer, which can also hold mutations and so make sure event fired into other DB stores are resilient. Normally people put a message queue with storage like NATS on top of many databases / microservices. NATS is amazing but perhaps too heavy.

Also it would at the very least give Ponzi a HA multimaster DB that can do pretty much anything. The Ponzi concept of references would have to change because you now get edges and nodes as a way of things referencing each other. This is more flexible.

Anyway this was a long stream of an idea about changing databases :)

ghost commented 6 years ago

Oh I forgot.. dgraph uses bleve for fulltext search and facets btw.. bonus

nilslice commented 6 years ago

Interesting ideas, @gedw99. I think we'd all benefit from an enhanced DB architecture and to support HA across more nodes. I haven't had a chance to evaluate the projects you mentioned but fully intend to. Thanks for keeping an eye on this!

ghost commented 6 years ago

Control plane, pub sub, cqrs https://github.com/nats-io/go-nats

Runs as its own Service btw, rather than a lib..
Uses telnet protocol as its core communication layer between nodes.
Can wrap the API with websockets, grpc, etc as needed or not.
Read about the Difference between Orchestration & Choreography to really understand the WHY around all this. In a nutshell, you can produce a pipeline using either design pattern. Orchestration is when you code the pipeline ( liek to do and then do y). Choreography is when you get different systems to sublish and subscribe to events ( type X pubscribes to type Y's created item event, and type Z subscribes to type X's deleted event, and turtles all the way down as they say) and so through everyone subscribing to everyone else a pipeline EMERGES, Its an emergent design pattern if you want to think about top level Design pattern ways of thinking..

Crazy fast, adaptable indexer for structured data https://github.com/pilosa

Dgraph code that handles the Indexer for non structured data that encompasses what is often called "Full Text Search (fts)" & "Faceted search" code: https://github.com/dgraph-io/dgraph/blob/master/tok/fts.go

How it all started : https://github.com/dgraph-io/dgraph/issues/592

In a nutshell facets allows you to search across all data by slicing it. Like on Amazon when you shopping for a TV and you choose filters on the left. LED, 50 to 60 inches. Or when applied to big data you can say all users ages between 10 and 20, female, live in Australia, and then as you get a result set the facets on the left CHANGE to show the sub facets you can choose. This is really the Unique thing about facets the fact that the you get a result set and then all the possibel facets then update on the left of the GUI. Very intuitive and allows normal users to essentially query very complex and difference data. Seem very nice for Ponzu because ponzu is all about creating types and then usin them

olliephillips commented 5 years ago

Closing this issue. No activity in 12 months. Please feel free to reopen if need to.

ponzu-cms / ponzu

Refactor to a distributed kv backend? #202