Using a different, non-unique key as the Ancestry ID column

rangerscience commented 2 months ago

Hi y'all, and thanks for the gem!

I'm working with documents with tables-of-contents (think a legal document - so, sections, sub-sections, sub-sub-sections, etc). I'm exploring ways to quickly duplicate documents, which means quickly duplicating trees - which, as far as I know, I can't do right now because each node needs to be added one at a time (parents first) so that child nodes have the IDs of parent nodes. (If I'm wrong about that, and there's a way to do that, horray and I just missed it!).

My thinking is that if each node has an id (the usual Rails primary key), a document_id (so that I know which nodes are part of the same tree) and then a tree_id to use as the key for Ancestry paths, then I'm good to go - I can take a dump of one document, change the document_id to the clone, and viola, I'm done.

However, in order to achieve this, AFAIK, I'd need to (1) tell Ancestry to use tree_id instead of id and (2) have Ancestry scope all queries/etc to document_id.

Is this possible with Ancestry as it currently exists, or would I need to dig into the guts? (If I do, can I get some pointers on where to start?)

PS - I ran into a weird GH issue trying to open this issue; the "Submit New Issue" button would grey out. I worked around it by opening a "Hello World" issue (simple title, simple body) and then editing it with this actual content. Huh.

shufeilei commented 4 weeks ago

the separation of ancestry id from the table's primary would be tremendously helpful in the case that i want to use the primary key (such as a composite primary key) for table association. currently, the ancestry gem cannot handle a composite primary key.

kbrock commented 4 weeks ago

@shufeilei your request seems a little simpler. You need to find a way to get both ids into the path. Maybe it looks like node.path = /#{node.parent.path}/#{node.id1}|#{node.id2}/. I think this is close to the current implementation and another scheme may be minimal.

At first I thought /node.id1/node.id2 - but using a different separator would make parsing the path very easy.

kbrock commented 4 weeks ago

@rangerscience Welcome! I'm trying to unpack this request.

Before

The path of a node is unique.
The pk id for that node is also unique.
The root_id groups the whole tree and is the left most id in the path.
node.path = /#{node.parent.path}/#{node.id}/. (not 100% accurate)

The path ends up looking like: ../grand_parent.id/parent.id/id

After

There are 3 different ids:

pk id is unique but not used in ancestry
A tree level document_id groups the whole tree together. It is not in the path
A tree_id which is used in the path.
path = #{node.parent.path}/#{node.tree_id}.

The goal being to more easily duplicate a node.

If you change a document_id, but leave all tree_ids the same, then the path will be the same across multiple documents. I guess you could just add the document_id into the left most position of the path, but then why not just use the tree_id of the root node. And then you drop document_id and use root_id to reference a document tree.

Ancestry was designed to have only 1 parent with multiple children. So to duplicate a document, you need unique tree_id values when you duplicate a tree node. So in the end, the tree_id would be a unique sequence unique id per node and would just mirror the id. Not sure what this buys you.

--

I hear you requesting an easy way to duplicate trees. This is a good request. It has been tricky but I believe this is similar to the desire to work with nested forms and the ability to bulk load values.

Unfortunately, I don't feel your proposal gets you to your goal. It just introduces new ids that directly map to existing concepts.

I also see another problem. The sorting concept could be a bit troublesome for you. I'd imagine that each paragraph would need to come in the same order.

Also, I agree that a table of contents looks like a hierarchical set of nodes. If you want multiple people to be editing each node at the same time then separating them into separate records makes sense (the http://fed.wiki.org/view/federated-wiki does stuff like this). But all in all, it seems like using a data structure that is more document centric may meet your needs better. Something like mongo, postgres and the full document as a blob, or git.

I probably misunderstood your goal, and do want to get duplication of trees faster, but I'm not sure this will meet your goals in the end.

G-vans commented 3 weeks ago

Hey guys, just a quick question here. When adding ancestry, do i need to also add parent_id column too or ancestry column would just be enough to handle parent-child relationships?

kbrock commented 3 weeks ago

@G-vans ancestry takes the ancestry column and derives a parent_id,root_id, and other columns from there.

shufeilei commented 3 weeks ago

@kbrock thank you for the suggestion. it works for my situation! i added a prefix to the ID.

kbrock commented 3 days ago

@shufeilei Glad I could help.

@rangerscience Are you still working through your solution. Did my explanation help you?

stefankroes / ancestry

Using a different, non-unique key as the Ancestry ID column #682

Before

After