Closed overlookmotel closed 9 years ago
Wouldn't that require some recursive SQL?
No recursion.
To get the whole parentage path of a Folder:
SELECT Folder.*
FROM FolderHierarchy
INNER JOIN Folder ON Folder.id = FolderHierarchy.ancestorId
WHERE FolderHierarchy.folderId = :Folder
ORDER BY Folder.hierarchyLevel
To get all children (and children's children etc) of a Folder:
SELECT Folder.*
FROM FolderHierarchy
INNER JOIN Folder ON Folder.id = FolderHierarchy.folderId
WHERE FolderHierarchy.ancestorId = :Folder
ORDER BY Folder.hierarchyLevel
Either can be done in a single query no matter how big the tree or how deeply nested. But that depends on keeping FolderHierarchy updated as records in Folder are created/updated/deleted. It's denormalisation to keep the data in a form that makes traversing hierarchies easy.
How would you get root.b, root.b.c, root.b.c.d with that code? The ancestor id is different across those rows i assume?
So if you fetch root.b, that has 1 as ancestor and 3 as id, root.b.c has 3 as anscestor and 4 as id, etc.
Not sure your query works with that?
I'm not totally sure this should be in the scope of sequelize core, what do you think @janmeier ?
Oh i see, you store all references in the join table? So if ancestor A has a tree of in total 25 different children they will all be references to A (and their individual parents) through the join table?
Yeah now i get it :D
Ah ha! Yes, all the data in FolderHierarchy is duplicated from Folder.
Well I'm going to implement it for my own purposes either way. I'd prefer to put it in Sequelize core raher than my own application if possible so it can be shared with the people!
I feel this would be better for a Sequelize plugin rather than core.
Incidentally, the most efficient way to implement the updating of FolderHierarchy when data in Folder changes is with database triggers. But are these supported by Sequelize across all dialects?
It can also be implemented by augmenting the create()
, updateAttributes()
, destroy()
etc methods in core Sequelize - which would then work across all dialects.
We have no TRIGGER support at all - And i have no idea if that's supported across dialects. We have hooks for create/update/destroy etc and are planning to add more hooks.
Currently hooks are defined on a model basis but we do need a way to add hooks to ALL models. So again i see this as an ideal candidate for developing our plugin system ;)
I'd argue it's suitable for Sequelize core as hierarchy traversal is quite a common requirement e.g. for nested folders or nested tagging structures.
This would allow you to do it as easily as you can in something like MongoDB.
Plugin system? Does one exist?
Well i'm not totally opposed, but i would like to hear the opinion of @sequelize/owners aswell. I'm just worried about adding too much to Sequelize. Ideally we want to move away from being too bloated and instead focus on modularizing sequelize as much as possible.
No plugin system exists currently, but we really want to support one, perhaps by providing hooks into almost all parts of the sequelize codebase.
Ah OK. Well I'm not opposed to doing it as a plugin/module. Maybe you're right that it's better that way.
Are there hooks for define()
yet?
Also, is there any way to tell Sequelize to add additional fields to all defined models?
e.g. new Sequelize( ..., { define: { additionalFields: { createdByUserId: Sequelize.INTEGER } } } )
No, none of those things exist yet.
Having throught about it, I can see the argument for making this a module. What do the rest of @sequelize/owners think?
If it's going to be a module, would it be possible to expose validateIncludedAllElement (/lib/model.js line 1751) as part of the public API? I'd need to call that to unpack any { include: { all: ... } }
parts into their constituent includes (i.e. convert it to { include: [ associateModel1, associatedModel 2 ] }
before doing my hierarchy business. It'd be preferable if I didn't have to repeat the code in my own module.
I'm not sure where in the public API it would go, though - maybe as part of Sequelize.Utils ?
As ever, happy to make the changes myself to do that - just looking for agreement on the best way to slot it into the API.
Please let me know what you think...
@overlookmotel hmm, i'd prefer not expose that part as something public. We might have to though because of the include all functionality - Although in hindsight perhaps that should've been a model aswell.
Ideally we would want to introduce this kind of functionality in hooks. If we really need to, lets add validateIncludedElement(s) to Model, but not to Model.prototype so it's only availabe via Sequelize.Model
One more thing: is support for triggers anywhere on the roadmap?
@mickhansen re: include all via Sequelize.Model - yes, that makes sense. I'll do that and submit a PR.
How do you mean "Although in hindsight perhaps that should've been a model aswell"?
@mickhansen Re: hooks - do you mean there'd be a hook into findAll()
after includes have been validated, which allows you to further play with the includes/other options before the query runs?
I'm not familiar with the hooks API or whether that's fully developed yet, but conceptually that make sense. Happy to try to do it that way if it hooks are ready and it makes more sense.
@overlookmotel typo, meant *module. Triggers is not on the roadmap currently.
Yes we have plans to introduce a lot more hooks, and one hook could be to add a find/findAll hook. Initially i actually planned to add the hook before parsing options, and then a hook after the result returns. But now i'm actually not sure if we should provide the hook before or after parsing the options.
Hooks are quite mature and we plan to introduce them much wider in the codebase to support plugins.
@janmeier what do you think, should a find hook be called before or after parsing options? We do stuff like add the paranoid clause and map field names - So the question is whether the implementer should see and modify the original call or the parsed call. The original call is probably the best for caching plugin solution but the parsed call makes the most sense for deep plugins like the one @overlookmotel is suggesting here.
How about 3 hooks?
I don't know if adding loads of hooks incurs a performance penalty.
Are there any docs about how hooks work that I can look at?
https://github.com/sequelize/sequelize/wiki/API-Reference-Hooks https://github.com/sequelize/sequelize/pull/894 https://github.com/sequelize/sequelize/blob/master/lib/hooks.js
3 hooks is definitely a possibility, we just need to come up with names, which is obviously the hardest part in all of this ;)
CPU wise hooks should not introduce more performance penalty than the amount of functions added to the stack and only incur the penalty of a promise if no hooks (which is fine, it's only called once).
Overall perf wise hooks incur the amount of performance equal to the amount of work done in a hook (so up to implementer).
"There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton
OK, cool, in that case maybe best to add LOADS of hooks everywhere to give flexibility to the user.
I'll read up on hooks, but need to go do some other work now. The perils of having a day job :)
Yeah loads of hooks is the way forward to a more modular Sequelize in my opinion, so that's what we're going for.
When some investor sees the light and finally funds me ill offer you a day job where you can program (Sequelize too) @overlookmotel ;)
I'll buy that for a dollar! Actually, my day job is pretty interesting (and COMPLETELY different to this). But variety is the spice of life...
Can you point me to somewhere in the Sequelize code where a hook is defined (as opposed to hooked onto)?
@overlookmotel well they aren't defined persay but https://github.com/sequelize/sequelize/blob/master/lib/instance.js#L519
OK, I'll look at that later.
By the way, concerning triggers, I could looking into implementing that, but I think only for MySQL. I don't have Postgres etc installed, and trying to work out a bunch of other databases isn't on my personal roadmap!
But maybe if someone else were happy to port it to other DBs from what I do with MySQL?
Or maybe there are bigger fish to fry than triggers right now...
Triggers isn't high on my list currently. Whats your specific usecase for triggers?
Use case for triggers for the hierarchy thing - to automatically update FolderHierarchy table as Folder table is altered.
It could be done in node code but that's inefficient - would involve reading from the DB, looping through results, then sending further UPDATE/DELETE queries to the DB. Better to do all that within the DB itself with triggers.
Is it acceptable to have a feature like triggers implemented only for MySQL dialect in Sequelize, and to throw an error if any other dialect calls it? My though is that at least it would be there in the API, would work for any MySQL users that needs it - and also it sits as an open invitation for someone who needs triggers in another dialect to implement it. Or is cross-dialect support for all features of primary importance.
Basically, if I'm going to write something, I'd prefer that the fruits of my labour can be shared rather than sitting hidden in my own code or module.
Well we'll be supporting JSON/JSON BINARY exclusively for PG so we will have some features that only exists for a dialect or two. I think that's fair.
A bit of google tells me postgres does have triggers - But it doesn't have to be on your shoulders to implement that aswell :)
I had the same problem of saving a hierarchy in the database and chose the nested set approach and I think this really nice shows where the limit of an ORM lies.
module.exports = function (sequelize, DataTypes) {
function *update(query, params) {
yield sequelize.query('UPDATE ' + Category.tableName + ' ' + query, Category, null, params);
}
const Category = sequelize.define('Category', {
lft: {
type: DataTypes.INTEGER,
allowNull: false
},
rgt: {
type: DataTypes.INTEGER,
allowNull: false
}
}, {
classMethods: {
findParent: function *(category) {
return yield this.find({
where: {
lft: {
lt: category.lft
},
rgt: {
gt: category.rgt
}
},
attributes: ['id'],
order: 'lft DESC'
});
}
},
instanceMethods: {
nestedSetSave: function *(parent, options) {
if (!parent) throw new Category.InvalidParentError('Parent is not allowed to be null');
this.lft = parent.rgt;
this.rgt = parent.rgt + 1;
var results = yield [
update('SET rgt = rgt + 2 WHERE rgt >= ?', [parent.rgt]),
update('SET lft = lft + 2 WHERE lft >= ?', [parent.rgt]),
this.save()
];
return results[2];
},
nestedSetUpdate: function *(parent, attributes, options) {
if (parent.id === this.id) {
throw new Category.InvalidParentError('Cannot move to self');
}
// Check if the parent has changed
const oldParent = yield Category.findParent(this);
// If the parents have changed move the item
if (parent.id !== oldParent.id) {
// 1. Negate all ids of the node and the subtree (this makes the nodes invisible for step 2)
yield update('SET rgt = rgt * -1, lft = lft * -1 WHERE lft >= ? AND rgt <= ?', [this.lft, this.rgt]);
// 2. Close the resulting gab
const width = this.rgt - this.lft + 1;
yield [
update('SET rgt = rgt - ? WHERE rgt > ?', [width, this.rgt]),
update('SET lft = lft - ? WHERE lft > ?', [width, this.rgt])
];
// 3. Create the new gab
// Recalculate parent rgt when the parent was effected by step 2
const parentRgt = parent.rgt > this.rgt ? parent.rgt - width : parent.rgt;
yield [
update('SET rgt = rgt + ? WHERE rgt >= ?', [width, parentRgt]),
update('SET lft = lft + ? WHERE lft >= ?', [width, parentRgt]),
// 4. Move the subtree in the new gab
// Query explanation:
// 1. Add the old rgt value to preserve offset
// 2. Remove negation
// 3. rgt: Add the updated new rgt value to rgt, lft: Add the new rgt value without the width
update('SET rgt = (rgt + ?) * -1 + ? WHERE rgt < 0', [this.rgt, parentRgt + width - 1]),
update('SET lft = (lft + ?) * -1 + ? WHERE lft < 0', [this.lft, parentRgt])
];
}
return yield this.updateAttributes(attributes);
},
nestedSetDestroy: function* () {
const width = this.rgt - this.lft + 1;
yield this.destroy();
// Delete all child nodes
yield Category.destroy('lft BETWEEN ' + this.lft + ' AND ' + this.rgt);
// Update lft and rgt attributes
yield [
update('SET rgt = rgt - ? WHERE rgt > ?', [width, this.rgt]),
update('SET lft = lft - ? WHERE lft > ?', [width, this.rgt])
];
return this;
}
}
});
};
The main problem I faced was a missing low level builder for sql queries. So I have to write my own queries with the problem that it could maybe not compatible with other database except mysql.
I currently do not implement my save, update, delete methods as hooks cause this must be run in a transaction. Will the upcoming new transaction system support hooks (or does it already)?
The other thing would be to use an sql query builder like knex.js that ensures cross database compability it anything planed in this direction?
A plugin system would be nice cause currently I don't know how to share my nested set implementation for different models in a nice way.
Hi @mlegenhausen.
I'm at work so haven't had time to read you code above yet, but a quick comment in the meantime...
I was put off the nested set approach as I understand that it is quite expensive on updates and inserts, although selects are very efficient. What do you think of my proposed solution of using an extra denormalised database table (as outlined above)?
The other solution I know of is to put a path column in the table which contains the full path of each item. e.g.:
id | name | parentId | path |
---|---|---|---|
1 | 'root' | null | '' |
2 | 'root.a' | 1 | '1,' |
3 | 'root.b' | 1 | '1,' |
4 | 'root.b.c' | 3 | '1,3,' |
Then you can:
But I didn't like this either as the length of the text field puts a limit on how nested your trees can be, whereas my solution above allows in theory infinite nesting.
Any thoughts?
First of all look at this presentation: http://de.slideshare.net/quipo/trees-in-the-database-advanced-data-structures?type=presentation
Second it depends...
What is your use case? Do you want to store a read, write or both heavy tree structure in your database? Do have only some hundreds nodes or do you have millions? Think of that then choose your data structure or even you should choose another database system.
I would only use denormalisations if there is no other way. If this is a new system plan it with all your knowledge about computer science and that says this is not a good thing!
I choose nested set cause in our use case the categories will only change slightly and there are only a few. 99% reads and what you would normally not expect from sets is it preserves order!
I think nested set is the perfect fit for relational databases cause it works as the name says with sets and when you look behind the mathematics of databases its all about sets. But as in the presentation described you can combine all these data structures to find your perfect fit.
I will work on an rewrite of my category definition to work with hooks. Which let the api be more natural in a sequelize way.
@mlegenhausen Thanks for the pointer to that presentation. Very interesting.
In my use case, there will be a lot of writing as well as reading so I need the writes to be fast. However, I can see how the nested set approach is better in a lot of circumstances.
if you are planning to make your implementation into a module, can I suggest that we share the same API? This would mean that users could switch between my module and yours seamlessly depending on which they find performs better for their use case?
Here's where I'm up to so far: https://github.com/overlookmotel/sequelize-hierarchy
Please have a look and let me know what you think...
Nice I will take a look. I already rewrote the nestedset model to work with hooks. So it is much easier to work with them.
const co = require('co');
const util = require('util');
module.exports = function (sequelize, DataTypes) {
function *update(query, params) {
yield sequelize.query('UPDATE ' + Category.tableName + ' ' + query, Category, null, params);
}
const Category = sequelize.define('Category', {
lft: {
type: DataTypes.INTEGER,
allowNull: false
},
rgt: {
type: DataTypes.INTEGER,
allowNull: false
}
}, {
classMethods: {
buildWithParent: function (params, parent) {
if (!parent) throw new Category.InvalidParentError('Parent is not allowed to be null');
const category = this.build(params);
category.lft = parent.rgt;
category.rgt = parent.rgt + 1;
return category;
},
createWithParent: function (params, parent, options) {
return this.buildWithParent(params, parent).save(options);
},
findParent: function (category) {
return this.find({
where: {
lft: {
lt: category.lft
},
rgt: {
gt: category.rgt
}
},
attributes: ['id'],
order: 'lft DESC'
});
},
createRoot: function () {
return this.create({
lft: 1,
rgt: 2
});
}
},
instanceMethods: {
moveTo: function* (parent) {
if (parent.id === this.id) {
throw new Category.InvalidParentError('Cannot move to self');
}
// Check if the parent has changed
const oldParent = yield Category.findParent(this);
// If the parents have changed move the item
if (parent.id !== oldParent.id) {
// 1. Negate all ids of the node and the subtree (this makes the nodes invisible for step 2)
yield update('SET rgt = rgt * -1, lft = lft * -1 WHERE lft >= ? AND rgt <= ?', [this.lft, this.rgt]);
// 2. Close the resulting gab
const width = this.rgt - this.lft + 1;
yield [
update('SET rgt = rgt - ? WHERE rgt > ?', [width, this.rgt]),
update('SET lft = lft - ? WHERE lft > ?', [width, this.rgt])
];
// 3. Create the new gab
// Recalculate parent rgt when the parent was effected by step 2
const parentRgt = parent.rgt > this.rgt ? parent.rgt - width : parent.rgt;
yield [
update('SET rgt = rgt + ? WHERE rgt >= ?', [width, parentRgt]),
update('SET lft = lft + ? WHERE lft >= ?', [width, parentRgt]),
// 4. Move the subtree in the new gab
// Query explanation:
// 1. Add the old rgt value to preserve offset
// 2. Remove negation
// 3. rgt: Add the updated new rgt value to rgt, lft: Add the new rgt value without the width
update('SET rgt = (rgt + ?) * -1 + ? WHERE rgt < 0', [this.rgt, parentRgt + width - 1]),
update('SET lft = (lft + ?) * -1 + ? WHERE lft < 0', [this.lft, parentRgt])
];
}
return this;
}
},
hooks: {
beforeCreate: co(function* (model) {
yield [
update('SET rgt = rgt + 2 WHERE rgt >= ?', [model.lft]),
update('SET lft = lft + 2 WHERE lft >= ?', [model.lft])
];
}),
afterDestroy: co(function* (model) {
const width = model.rgt - model.lft + 1;
// Delete all child nodes
yield this.destroy('lft BETWEEN ' + model.lft + ' AND ' + model.rgt);
// Update lft and rgt attributes
yield [
update('SET rgt = rgt - ? WHERE rgt > ?', [width, model.rgt]),
update('SET lft = lft - ? WHERE lft > ?', [width, model.rgt])
];
})
}
});
// Errors
Category.InvalidParentError = function InvalidParentError(message) {
this.name = 'InvalidParentError';
this.message = message;
this.stack = (new Error()).stack;
};
util.inherits(Category.InvalidParentError, Error);
return Category;
};
I extends the default model by the following functions:
buildWithParent(parent) // Takes a parent instance and sets the lft and rgt value appropiate
createWithParent(parent) // The same as before but with a save() call
findParent(category) // Returns the parent instance of an category
moveTo(parent) // Moves the category to the new parent
I will extend the moveTo
method to support a successor attribute so it is possible to move a node and set its successor category.
What do you think?
I've taken the approach of only extending Sequelize by adding Model#isHierarchy
. Everything else uses the usual Sequelize create, update, updateAttributes, destroy commands etc. And then hooks catch those calls and do what they need to do hidden from the user.
It's quite a different approach to yours. Let me know what you think when you've had a chance to look at my (unfinished) module...
+1 Very useful. It would be nice to have nested hierarchies in the core, I agree too that is a common pattern out there.
@diosney You can use my sequelize plugin for this https://www.npmjs.com/package/sequelize-hierarchy
@overlookmotel Yeah sure, I'm testing it right now and so far so good :smile: Thanks!
I would like to propose an addition to the API.
At present, if you have a tree structure of nested folders, you can represent that:
However, to retrieve an entire nested hierarchy requires multiple queries to traverse each folder and iteratively fetch their children. It would be useful if you could do this in a single query.
How about this?
Setting
hierarchy
totrue
would do the following:hierarchyLevel
to the Folder model indicating how deeply nested in the tree the item isFolderHierarchy
representing a Folder's link to all it's ancestors and descendentsgetChildDescendents
andgetParentAncestors
Folder.findAll( { include: [ { model: Folder, as: 'ChildDescendents' } ] } )
andFolder.findAll( { include: [ { model: Folder, as: 'ParentAncestors' } ] } )
to retrieve all parents/parent's parents/parent's parent's parents or children/children's children/ children's children's children - all in one gofindAll( ... ChildDescendents )
, would present the returned Descendents in a nested tree structureThis would allow retrieval of an entire hierarchy in a single query, and for the tree structure to be transparently maintained without any further user code.
The obvious use cases are:
NB: the names of the FolderHierarchy table, hierarchyLevel attribute, and all the accessors etc would be customisable - by providing
hierarchy: { ... }
options instead ofhierarchy: true
in thehasMany()
call that initialises the hierarchy.An example:
The data created in the tables by the example above would be:
Folder:
FolderHierarchy:
folderId | ancestorId 2 | 1 3 | 1 4 | 1 4 | 3
So, what do you think?
I need this for a project I'm working on, so happy to write the code. Just want to make sure it'd be welcome in Sequelize before I proceed.