sequelize / sequelize

Feature-rich ORM for modern Node.js and TypeScript, it supports PostgreSQL (with JSON and JSONB support), MySQL, MariaDB, SQLite, MS SQL Server, Snowflake, Oracle DB (v6), DB2 and DB2 for IBM i.
https://sequelize.org/
MIT License
29.25k stars 4.25k forks source link

Feature request: Support for hierarchies #1751

Closed overlookmotel closed 9 years ago

overlookmotel commented 10 years ago

I would like to propose an addition to the API.

At present, if you have a tree structure of nested folders, you can represent that:

Folder.hasMany( Folder, { as: 'Childs', foreignKey: 'parentId' } )
Folder.belongsTo( Folder, { as: 'Parent', foreignKey: 'parentId' } )

However, to retrieve an entire nested hierarchy requires multiple queries to traverse each folder and iteratively fetch their children. It would be useful if you could do this in a single query.

How about this?

Folder.hasMany( Folder, { as: 'Childs', foreignKey: 'parentId', hierarchy: true } )
Folder.belongsTo( Folder, { as: 'Parent', foreignKey: 'parentId', hierarchy: true } )

Setting hierarchy to true would do the following:

This would allow retrieval of an entire hierarchy in a single query, and for the tree structure to be transparently maintained without any further user code.

The obvious use cases are:

NB: the names of the FolderHierarchy table, hierarchyLevel attribute, and all the accessors etc would be customisable - by providing hierarchy: { ... } options instead of hierarchy: true in the hasMany() call that initialises the hierarchy.

An example:

// set up table
var Folder = sequelize.define( 'Folder', { name: Sequelize.STRING } )

// set up association
Folder.hasMany( Folder, { as: 'Childs', foreignKey: 'parentId', hierarchy: true } )
Folder.belongsTo( Folder, { as: 'Parent', foreignKey: 'parentId', hierarchy: true } )

// sync
sequelize.sync(). ...

// create folders
Folder.create( { name: 'root', parentId: null } ). ... // id: 1
Folder.create( { name: 'root.a', parentId: 1 } ). ... // id: 2
Folder.create( { name: 'root.b', parentId: 1 } ). ... // id: 3
Folder.create( { name: 'root.b.c', parentId: 3 } ). ... // id: 4

// query data
Folder.findAll( { where: { id: 4 }, include: [ { model: Folder, as: 'ParentAncestors' } ] } ). ...
/* returns {
  id: 4,
  name: 'root.b.c',
  hierarchyLevel: 4,
  parentAncestors: [
    { id: 2, name: 'root.b', hierarchyLevel: 2 },
    { id: 1, name: 'root', hierarchyLevel: 1 }
  ]
} */

Folder.findAll( { where: { id: 1 }, include: [ { model: Folder, as: 'ChildDescendents', hierarchy: true } ] } ). ...
/* returns {
  id: 1,
  name: 'root',
  hierarchyLevel: 1,
  childDescendents: [
    { id: 2, name: 'root.a', hierarchyLevel: 2 },
    { id: 3, name: 'root.b', hierarchyLevel: 2, childDescendents: [
      { id: 4, name: 'root.b.c', hierarchyLevel: 3 }
    ] }
  ]
} */

The data created in the tables by the example above would be:

Folder:

id name parentId hierarchyLevel
1 root null 1
2 root.a 1 2
3 root.b 1 2
4 root.b.c 3 3

FolderHierarchy:

folderId | ancestorId 2 | 1 3 | 1 4 | 1 4 | 3

So, what do you think?

I need this for a project I'm working on, so happy to write the code. Just want to make sure it'd be welcome in Sequelize before I proceed.

mickhansen commented 10 years ago

Wouldn't that require some recursive SQL?

overlookmotel commented 10 years ago

No recursion.

To get the whole parentage path of a Folder:

SELECT Folder.*
FROM FolderHierarchy
INNER JOIN Folder ON Folder.id = FolderHierarchy.ancestorId
WHERE FolderHierarchy.folderId = :Folder
ORDER BY Folder.hierarchyLevel

To get all children (and children's children etc) of a Folder:

SELECT Folder.*
FROM FolderHierarchy
INNER JOIN Folder ON Folder.id = FolderHierarchy.folderId
WHERE FolderHierarchy.ancestorId = :Folder
ORDER BY Folder.hierarchyLevel

Either can be done in a single query no matter how big the tree or how deeply nested. But that depends on keeping FolderHierarchy updated as records in Folder are created/updated/deleted. It's denormalisation to keep the data in a form that makes traversing hierarchies easy.

mickhansen commented 10 years ago

How would you get root.b, root.b.c, root.b.c.d with that code? The ancestor id is different across those rows i assume?

So if you fetch root.b, that has 1 as ancestor and 3 as id, root.b.c has 3 as anscestor and 4 as id, etc.

Not sure your query works with that?

mickhansen commented 10 years ago

I'm not totally sure this should be in the scope of sequelize core, what do you think @janmeier ?

mickhansen commented 10 years ago

Oh i see, you store all references in the join table? So if ancestor A has a tree of in total 25 different children they will all be references to A (and their individual parents) through the join table?

mickhansen commented 10 years ago

Yeah now i get it :D

overlookmotel commented 10 years ago

Ah ha! Yes, all the data in FolderHierarchy is duplicated from Folder.

overlookmotel commented 10 years ago

Well I'm going to implement it for my own purposes either way. I'd prefer to put it in Sequelize core raher than my own application if possible so it can be shared with the people!

mickhansen commented 10 years ago

I feel this would be better for a Sequelize plugin rather than core.

overlookmotel commented 10 years ago

Incidentally, the most efficient way to implement the updating of FolderHierarchy when data in Folder changes is with database triggers. But are these supported by Sequelize across all dialects?

It can also be implemented by augmenting the create(), updateAttributes(), destroy() etc methods in core Sequelize - which would then work across all dialects.

mickhansen commented 10 years ago

We have no TRIGGER support at all - And i have no idea if that's supported across dialects. We have hooks for create/update/destroy etc and are planning to add more hooks.

Currently hooks are defined on a model basis but we do need a way to add hooks to ALL models. So again i see this as an ideal candidate for developing our plugin system ;)

overlookmotel commented 10 years ago

I'd argue it's suitable for Sequelize core as hierarchy traversal is quite a common requirement e.g. for nested folders or nested tagging structures.

This would allow you to do it as easily as you can in something like MongoDB.

overlookmotel commented 10 years ago

Plugin system? Does one exist?

mickhansen commented 10 years ago

Well i'm not totally opposed, but i would like to hear the opinion of @sequelize/owners aswell. I'm just worried about adding too much to Sequelize. Ideally we want to move away from being too bloated and instead focus on modularizing sequelize as much as possible.

No plugin system exists currently, but we really want to support one, perhaps by providing hooks into almost all parts of the sequelize codebase.

overlookmotel commented 10 years ago

Ah OK. Well I'm not opposed to doing it as a plugin/module. Maybe you're right that it's better that way.

overlookmotel commented 10 years ago

Are there hooks for define() yet? Also, is there any way to tell Sequelize to add additional fields to all defined models? e.g. new Sequelize( ..., { define: { additionalFields: { createdByUserId: Sequelize.INTEGER } } } )

mickhansen commented 10 years ago

No, none of those things exist yet.

overlookmotel commented 10 years ago

Having throught about it, I can see the argument for making this a module. What do the rest of @sequelize/owners think?

If it's going to be a module, would it be possible to expose validateIncludedAllElement (/lib/model.js line 1751) as part of the public API? I'd need to call that to unpack any { include: { all: ... } } parts into their constituent includes (i.e. convert it to { include: [ associateModel1, associatedModel 2 ] } before doing my hierarchy business. It'd be preferable if I didn't have to repeat the code in my own module.

I'm not sure where in the public API it would go, though - maybe as part of Sequelize.Utils ?

As ever, happy to make the changes myself to do that - just looking for agreement on the best way to slot it into the API.

Please let me know what you think...

mickhansen commented 10 years ago

@overlookmotel hmm, i'd prefer not expose that part as something public. We might have to though because of the include all functionality - Although in hindsight perhaps that should've been a model aswell.

Ideally we would want to introduce this kind of functionality in hooks. If we really need to, lets add validateIncludedElement(s) to Model, but not to Model.prototype so it's only availabe via Sequelize.Model

overlookmotel commented 10 years ago

One more thing: is support for triggers anywhere on the roadmap?

overlookmotel commented 10 years ago

@mickhansen re: include all via Sequelize.Model - yes, that makes sense. I'll do that and submit a PR.

How do you mean "Although in hindsight perhaps that should've been a model aswell"?

overlookmotel commented 10 years ago

@mickhansen Re: hooks - do you mean there'd be a hook into findAll() after includes have been validated, which allows you to further play with the includes/other options before the query runs?

I'm not familiar with the hooks API or whether that's fully developed yet, but conceptually that make sense. Happy to try to do it that way if it hooks are ready and it makes more sense.

mickhansen commented 10 years ago

@overlookmotel typo, meant *module. Triggers is not on the roadmap currently.

Yes we have plans to introduce a lot more hooks, and one hook could be to add a find/findAll hook. Initially i actually planned to add the hook before parsing options, and then a hook after the result returns. But now i'm actually not sure if we should provide the hook before or after parsing the options.

Hooks are quite mature and we plan to introduce them much wider in the codebase to support plugins.

mickhansen commented 10 years ago

@janmeier what do you think, should a find hook be called before or after parsing options? We do stuff like add the paranoid clause and map field names - So the question is whether the implementer should see and modify the original call or the parsed call. The original call is probably the best for caching plugin solution but the parsed call makes the most sense for deep plugins like the one @overlookmotel is suggesting here.

overlookmotel commented 10 years ago

How about 3 hooks?

  1. before parsing options
  2. after parsing options
  3. after result returned

I don't know if adding loads of hooks incurs a performance penalty.

Are there any docs about how hooks work that I can look at?

mickhansen commented 10 years ago

https://github.com/sequelize/sequelize/wiki/API-Reference-Hooks https://github.com/sequelize/sequelize/pull/894 https://github.com/sequelize/sequelize/blob/master/lib/hooks.js

3 hooks is definitely a possibility, we just need to come up with names, which is obviously the hardest part in all of this ;)

CPU wise hooks should not introduce more performance penalty than the amount of functions added to the stack and only incur the penalty of a promise if no hooks (which is fine, it's only called once).

Overall perf wise hooks incur the amount of performance equal to the amount of work done in a hook (so up to implementer).

overlookmotel commented 10 years ago

"There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton

OK, cool, in that case maybe best to add LOADS of hooks everywhere to give flexibility to the user.

I'll read up on hooks, but need to go do some other work now. The perils of having a day job :)

mickhansen commented 10 years ago

Yeah loads of hooks is the way forward to a more modular Sequelize in my opinion, so that's what we're going for.

mickhansen commented 10 years ago

When some investor sees the light and finally funds me ill offer you a day job where you can program (Sequelize too) @overlookmotel ;)

overlookmotel commented 10 years ago

I'll buy that for a dollar! Actually, my day job is pretty interesting (and COMPLETELY different to this). But variety is the spice of life...

Can you point me to somewhere in the Sequelize code where a hook is defined (as opposed to hooked onto)?

mickhansen commented 10 years ago

@overlookmotel well they aren't defined persay but https://github.com/sequelize/sequelize/blob/master/lib/instance.js#L519

overlookmotel commented 10 years ago

OK, I'll look at that later.

overlookmotel commented 10 years ago

By the way, concerning triggers, I could looking into implementing that, but I think only for MySQL. I don't have Postgres etc installed, and trying to work out a bunch of other databases isn't on my personal roadmap!

But maybe if someone else were happy to port it to other DBs from what I do with MySQL?

overlookmotel commented 10 years ago

Or maybe there are bigger fish to fry than triggers right now...

mickhansen commented 10 years ago

Triggers isn't high on my list currently. Whats your specific usecase for triggers?

overlookmotel commented 10 years ago

Use case for triggers for the hierarchy thing - to automatically update FolderHierarchy table as Folder table is altered.

It could be done in node code but that's inefficient - would involve reading from the DB, looping through results, then sending further UPDATE/DELETE queries to the DB. Better to do all that within the DB itself with triggers.

Is it acceptable to have a feature like triggers implemented only for MySQL dialect in Sequelize, and to throw an error if any other dialect calls it? My though is that at least it would be there in the API, would work for any MySQL users that needs it - and also it sits as an open invitation for someone who needs triggers in another dialect to implement it. Or is cross-dialect support for all features of primary importance.

Basically, if I'm going to write something, I'd prefer that the fruits of my labour can be shared rather than sitting hidden in my own code or module.

mickhansen commented 10 years ago

Well we'll be supporting JSON/JSON BINARY exclusively for PG so we will have some features that only exists for a dialect or two. I think that's fair.

A bit of google tells me postgres does have triggers - But it doesn't have to be on your shoulders to implement that aswell :)

mlegenhausen commented 10 years ago

I had the same problem of saving a hierarchy in the database and chose the nested set approach and I think this really nice shows where the limit of an ORM lies.

module.exports = function (sequelize, DataTypes) {
    function *update(query, params) {
        yield sequelize.query('UPDATE ' + Category.tableName + ' ' + query, Category, null, params);
    }

    const Category = sequelize.define('Category', {
        lft: {
            type: DataTypes.INTEGER,
            allowNull: false
        },
        rgt: {
            type: DataTypes.INTEGER,
            allowNull: false
        }
    }, {
                classMethods: {
            findParent: function *(category) {
                return yield this.find({
                    where: {
                        lft: {
                            lt: category.lft
                        },
                        rgt: {
                            gt: category.rgt
                        }
                    },
                    attributes: ['id'],
                    order: 'lft DESC'
                });
            }
        },
        instanceMethods: {
            nestedSetSave: function *(parent, options) {
                if (!parent) throw new Category.InvalidParentError('Parent is not allowed to be null');

                this.lft = parent.rgt;
                this.rgt = parent.rgt + 1;

                var results = yield [
                    update('SET rgt = rgt + 2 WHERE rgt >= ?', [parent.rgt]),
                    update('SET lft = lft + 2 WHERE lft >= ?', [parent.rgt]),
                    this.save()
                ];

                return results[2];
            },
            nestedSetUpdate: function *(parent, attributes, options) {
                if (parent.id === this.id) {
                    throw new Category.InvalidParentError('Cannot move to self');
                }
                // Check if the parent has changed
                const oldParent = yield Category.findParent(this);
                // If the parents have changed move the item
                if (parent.id !== oldParent.id) {
                    // 1. Negate all ids of the node and the subtree (this makes the nodes invisible for step 2)
                    yield update('SET rgt = rgt * -1, lft = lft * -1 WHERE lft >= ? AND rgt <= ?', [this.lft, this.rgt]);

                    // 2. Close the resulting gab
                    const width = this.rgt - this.lft + 1;
                    yield [
                        update('SET rgt = rgt - ? WHERE rgt > ?', [width, this.rgt]),
                        update('SET lft = lft - ? WHERE lft > ?', [width, this.rgt])
                    ];

                    // 3. Create the new gab
                    // Recalculate parent rgt when the parent was effected by step 2
                    const parentRgt = parent.rgt > this.rgt ? parent.rgt - width : parent.rgt;
                    yield [
                        update('SET rgt = rgt + ? WHERE rgt >= ?', [width, parentRgt]),
                        update('SET lft = lft + ? WHERE lft >= ?', [width, parentRgt]),
                        // 4. Move the subtree in the new gab
                        // Query explanation:
                        // 1. Add the old rgt value to preserve offset
                        // 2. Remove negation
                        // 3. rgt: Add the updated new rgt value to rgt, lft: Add the new rgt value without the width
                        update('SET rgt = (rgt + ?) * -1 + ? WHERE rgt < 0', [this.rgt, parentRgt + width - 1]),
                        update('SET lft = (lft + ?) * -1 + ? WHERE lft < 0', [this.lft, parentRgt])
                    ];
                }
                return yield this.updateAttributes(attributes);
            },
            nestedSetDestroy: function* () {
                const width = this.rgt - this.lft + 1;

                yield this.destroy();

                // Delete all child nodes
                yield Category.destroy('lft BETWEEN ' + this.lft + ' AND ' + this.rgt);

                // Update lft and rgt attributes
                yield [
                    update('SET rgt = rgt - ? WHERE rgt > ?', [width, this.rgt]),
                    update('SET lft = lft - ? WHERE lft > ?', [width, this.rgt])
                ];

                return this;
            }
        }
    });
};

The main problem I faced was a missing low level builder for sql queries. So I have to write my own queries with the problem that it could maybe not compatible with other database except mysql.

I currently do not implement my save, update, delete methods as hooks cause this must be run in a transaction. Will the upcoming new transaction system support hooks (or does it already)?

The other thing would be to use an sql query builder like knex.js that ensures cross database compability it anything planed in this direction?

A plugin system would be nice cause currently I don't know how to share my nested set implementation for different models in a nice way.

overlookmotel commented 10 years ago

Hi @mlegenhausen.

I'm at work so haven't had time to read you code above yet, but a quick comment in the meantime...

I was put off the nested set approach as I understand that it is quite expensive on updates and inserts, although selects are very efficient. What do you think of my proposed solution of using an extra denormalised database table (as outlined above)?

The other solution I know of is to put a path column in the table which contains the full path of each item. e.g.:

id name parentId path
1 'root' null ''
2 'root.a' 1 '1,'
3 'root.b' 1 '1,'
4 'root.b.c' 3 '1,3,'

Then you can:

But I didn't like this either as the length of the text field puts a limit on how nested your trees can be, whereas my solution above allows in theory infinite nesting.

Any thoughts?

mlegenhausen commented 10 years ago

First of all look at this presentation: http://de.slideshare.net/quipo/trees-in-the-database-advanced-data-structures?type=presentation

Second it depends...

What is your use case? Do you want to store a read, write or both heavy tree structure in your database? Do have only some hundreds nodes or do you have millions? Think of that then choose your data structure or even you should choose another database system.

I would only use denormalisations if there is no other way. If this is a new system plan it with all your knowledge about computer science and that says this is not a good thing!

I choose nested set cause in our use case the categories will only change slightly and there are only a few. 99% reads and what you would normally not expect from sets is it preserves order!

I think nested set is the perfect fit for relational databases cause it works as the name says with sets and when you look behind the mathematics of databases its all about sets. But as in the presentation described you can combine all these data structures to find your perfect fit.

I will work on an rewrite of my category definition to work with hooks. Which let the api be more natural in a sequelize way.

overlookmotel commented 10 years ago

@mlegenhausen Thanks for the pointer to that presentation. Very interesting.

In my use case, there will be a lot of writing as well as reading so I need the writes to be fast. However, I can see how the nested set approach is better in a lot of circumstances.

if you are planning to make your implementation into a module, can I suggest that we share the same API? This would mean that users could switch between my module and yours seamlessly depending on which they find performs better for their use case?

Here's where I'm up to so far: https://github.com/overlookmotel/sequelize-hierarchy

Please have a look and let me know what you think...

mlegenhausen commented 10 years ago

Nice I will take a look. I already rewrote the nestedset model to work with hooks. So it is much easier to work with them.

const co = require('co');
const util = require('util');

module.exports = function (sequelize, DataTypes) {
    function *update(query, params) {
        yield sequelize.query('UPDATE ' + Category.tableName + ' ' + query, Category, null, params);
    }

    const Category = sequelize.define('Category', {
        lft: {
            type: DataTypes.INTEGER,
            allowNull: false
        },
        rgt: {
            type: DataTypes.INTEGER,
            allowNull: false
        }
    }, {
        classMethods: {
            buildWithParent: function (params, parent) {
                if (!parent) throw new Category.InvalidParentError('Parent is not allowed to be null');
                const category = this.build(params);
                category.lft = parent.rgt;
                category.rgt = parent.rgt + 1;
                return category;
            },
            createWithParent: function (params, parent, options) {
                return this.buildWithParent(params, parent).save(options);
            },
            findParent: function (category) {
                return this.find({
                    where: {
                        lft: {
                            lt: category.lft
                        },
                        rgt: {
                            gt: category.rgt
                        }
                    },
                    attributes: ['id'],
                    order: 'lft DESC'
                });
            },
            createRoot: function () {
                return this.create({
                    lft: 1,
                    rgt: 2
                });
            }
        },
        instanceMethods: {
            moveTo: function* (parent) {
                if (parent.id === this.id) {
                    throw new Category.InvalidParentError('Cannot move to self');
                }
                // Check if the parent has changed
                const oldParent = yield Category.findParent(this);
                // If the parents have changed move the item
                if (parent.id !== oldParent.id) {
                    // 1. Negate all ids of the node and the subtree (this makes the nodes invisible for step 2)
                    yield update('SET rgt = rgt * -1, lft = lft * -1 WHERE lft >= ? AND rgt <= ?', [this.lft, this.rgt]);

                    // 2. Close the resulting gab
                    const width = this.rgt - this.lft + 1;
                    yield [
                        update('SET rgt = rgt - ? WHERE rgt > ?', [width, this.rgt]),
                        update('SET lft = lft - ? WHERE lft > ?', [width, this.rgt])
                    ];

                    // 3. Create the new gab
                    // Recalculate parent rgt when the parent was effected by step 2
                    const parentRgt = parent.rgt > this.rgt ? parent.rgt - width : parent.rgt;
                    yield [
                        update('SET rgt = rgt + ? WHERE rgt >= ?', [width, parentRgt]),
                        update('SET lft = lft + ? WHERE lft >= ?', [width, parentRgt]),
                        // 4. Move the subtree in the new gab
                        // Query explanation:
                        // 1. Add the old rgt value to preserve offset
                        // 2. Remove negation
                        // 3. rgt: Add the updated new rgt value to rgt, lft: Add the new rgt value without the width
                        update('SET rgt = (rgt + ?) * -1 + ? WHERE rgt < 0', [this.rgt, parentRgt + width - 1]),
                        update('SET lft = (lft + ?) * -1 + ? WHERE lft < 0', [this.lft, parentRgt])
                    ];
                }
                return this;
            }
        },
        hooks: {
            beforeCreate: co(function* (model) {
                yield [
                    update('SET rgt = rgt + 2 WHERE rgt >= ?', [model.lft]),
                    update('SET lft = lft + 2 WHERE lft >= ?', [model.lft])
                ];
            }),
            afterDestroy: co(function* (model) {
                const width = model.rgt - model.lft + 1;
                // Delete all child nodes
                yield this.destroy('lft BETWEEN ' + model.lft + ' AND ' + model.rgt);
                // Update lft and rgt attributes
                yield [
                    update('SET rgt = rgt - ? WHERE rgt > ?', [width, model.rgt]),
                    update('SET lft = lft - ? WHERE lft > ?', [width, model.rgt])
                ];
            })
        }
    });

    // Errors
    Category.InvalidParentError = function InvalidParentError(message) {
        this.name = 'InvalidParentError';
        this.message = message;
        this.stack = (new Error()).stack;
    };

    util.inherits(Category.InvalidParentError, Error);

    return Category;
};

I extends the default model by the following functions:

buildWithParent(parent) // Takes a parent instance and sets the lft and rgt value appropiate
createWithParent(parent) // The same as before but with a save() call
findParent(category) // Returns the parent instance of an category
moveTo(parent) // Moves the category to the new parent

I will extend the moveTo method to support a successor attribute so it is possible to move a node and set its successor category.

What do you think?

overlookmotel commented 10 years ago

I've taken the approach of only extending Sequelize by adding Model#isHierarchy. Everything else uses the usual Sequelize create, update, updateAttributes, destroy commands etc. And then hooks catch those calls and do what they need to do hidden from the user.

It's quite a different approach to yours. Let me know what you think when you've had a chance to look at my (unfinished) module...

diosney commented 8 years ago

+1 Very useful. It would be nice to have nested hierarchies in the core, I agree too that is a common pattern out there.

overlookmotel commented 8 years ago

@diosney You can use my sequelize plugin for this https://www.npmjs.com/package/sequelize-hierarchy

diosney commented 8 years ago

@overlookmotel Yeah sure, I'm testing it right now and so far so good :smile: Thanks!