pouchdb-community / relational-pouch

Store relational data in PouchDB/CouchDB
Apache License 2.0
405 stars 68 forks source link

Modeling a rooted tree graph #67

Open stickbyatlas opened 7 years ago

stickbyatlas commented 7 years ago

Hi there,

I'm exploring some options for how I'm going to structure the related data in my PouchDB / CouchDB database, and I'm struggling to understand the limits of what relational-pouch can do in my use case. I suspect that this is way out of scope for what relational-pouch was built to handle, but I wanted to make sure.

I was unsure whether this heading alluded to what I'm trying to do or not.

Basically, let's assume my data model has 30-40 document types, any of which can have a parent_id that links them to exactly one parent document, of an arbitrary type, in a non-cyclic fashion. For any given document, if I follow the chain of parent_ids all the way to its end, I'll reach a common ancestor (the root).

If I query relational-pouch for a leaf document, can / will relational-pouch automatically, recursively, side-load all the ancestors in the response? Or will I need to do that at the application level?

How would I handle a situation where I need to delete a non-leaf node? If I don't save the relationship bi-directionally (since that would lead to parent nodes getting hammered with revisions), how would I go about re-linking all those descendants to the deleted document's parent?

Here's an example that might help illustrate the situation I'm grappling with:

// root node that everything points back to
{ _id: "country_1", type: "country"}

// children of root
{_id: "state_1", type: "state", parent_id: "country_1"}
{_id: "state_2": type: "state", parent_id: "country_1"}

// grandchildren of root
{_id: "city_1": type: "city", parent_id: "state_1"}
{_id: "city_2": type: "city", parent_id: "state_1"}
{_id: "city_3": type: "city", parent_id: "state_2"}
{_id: "city_4": type: "city", parent_id: "state_2"}

// assets of various types can belong to federal, state or municipal entities
// these would be either children, grandchildren or great-grandchildren of root
{_id: "asset_a_1", type: "asset_a", parent_id: "country_1"}
{_id: "asset_a_2", type: "asset_a", parent_id: "state_2"}
{_id: "asset_b_1", type: "asset_b", parent_id: "city_3"}
{_id: "asset_b_2", type: "asset_b", parent_id: "city_2"}
{_id: "asset_b_3", type: "asset_b", parent_id: "city_2"}
{_id: "asset_c_1", type: "asset_c", parent_id: "city_3"}
{_id: "asset_c_2", type: "asset_c", parent_id: "city_4"}
{_id: "asset_c_3", type: "asset_c", parent_id: "city_4"}

When I pull up asset_c_2, would relational_pouch resolve all the way back to country_1, or would it only resolve city_4?

If I wanted to delete state_2, this would orphan city_3, city_4, asset_a_2, asset_c_1, asset_c_2 and asset_c_3, since state_2 is their link to the root node. So, before deleting state_2, I would need to find all documents whose parent_id points to state_2 and link them to country_1. Can / does relational-pouch handle any of this logic, or would I need to do all of this myself?

My feeling is that, because the parent_id points to an arbitrary document type, relational-pouch probably can't handle this in the first place since you have to define belongsTo relationships by individual types in your schema. But I can't be the only person who's ever tried to model a graph in a *ouchDB. Is there a better-suited library I haven't heard of? Do I need to write my own?

P.S. Thanks for making ember-pouch work. It's saved me weeks of headaches :)

stickbyatlas commented 7 years ago

Additional reading on this, from 2009: http://seancribbs.com/tech/2009/09/28/modeling-a-tree-in-a-document-database/

The author's first solution is basically what I described. I think relational-pouch might be able to model solution 3, provided all related documents had a generic type with belongsTo pointing to itself, and a member field that points to the underlying related document ID. But this could get really, really messy.

Basically, the documents handled by relational-pouch would represent a database of links, and link documents would contain one key whose value is the ID of the represented document.

This sounds like a terrible idea.

stickbyatlas commented 7 years ago

There's also this, from 2008, which I was toying with in my head before I started looking for a library: http://probablyprogramming.com/2008/07/04/storing-hierarchical-data-in-couchdb

But this takes me further from using a library, and involves writing a lot of original (read: buggy) code, which I'd like to avoid doing.

nolanlawson commented 7 years ago

I believe it recursively loads all ancestors, yes, but you would need to write a test to confirm. Honestly I wrote this code over a year ago and have not looked at it in awhile, so I don't really remember. :)

nolanlawson commented 7 years ago

FWIW I would also say that I consider relational-pouch more of a hack than anything; relational data and CouchDB do not really fit very well together because the concept of master-master replication and relational data do not really fit together. E.g. https://github.com/nolanlawson/relational-pouch/issues/24 does a good job of explaining why this is the case. For heavily relational data you may find that CouchDB is a poor fit (although there isn't a synchronizing database in the world that will fit either AFAIK due to theoretical limitations here)

stickbyatlas commented 7 years ago

Yeah, I'm kind of amazed that this level of relational functionality could be packaged into a plugin for a non-relational document store.

Agreed *ouchDB is a poor fit for relational data. However, you can mitigate a lot of potential conflicts depending on how you design the system. My solution is to implement one-way sync from the canonical database to all clients. Canonical data (and its relationships) are treated as read-only. If the user wants to edit a document or its relationships, those changes are captured in a separate document as a 'delta', which is then pushed into a centralized queue. This gives the database maintainer some oversight over which changes they'll apply to the canonical documents, and also ensures that only non-conflicting changes are ever pushed down to the client.

I've learned to avoid bidirectional replication at all costs, because conflicts can get out of hand pretty quickly. Ember's 'data down, actions up' approach to managing state is what inspired me to design the system this way. It's much easier to map out the flow of information when you're pulling down canonical data and pushing up proposed commits that alter that data in a centralized way.

509dave16 commented 6 years ago

@stickbyatlas This more recent relational PouchDB package might be better suited to your needs: https://github.com/compactd/slothdb