jackalope / jackalope-doctrine-dbal

Doctrine DBAL transport implementation for Jackalope
http://jackalope.github.io
Other
143 stars 60 forks source link

No foreign keys are used on phpcr_binarydata, resulting in dangling binary data? #356

Closed holtkamp closed 1 year ago

holtkamp commented 6 years ago

When looking at the schema generated for a MySQL database, I noticed that no foreign keys are used.

Is this on purpose? For example: https://github.com/jackalope/jackalope-doctrine-dbal/blob/master/src/Jackalope/Transport/DoctrineDBAL/RepositorySchema.php#L123-L134

The phpcr_binarydata.node_id column is a foreign key of phpcr_nodes.id and can (/ should?) be configured as such?

An advantage can be that upon removing the node, also the binary-data is removed. For some reason I currently have a table with a lot "dangling" entries in the phpcr_binarydata table, which can be determined with a query like:

SELECT id 
FROM phpcr_binarydata 
WHERE node_id NOT IN (SELECT id FROM phpcr_nodes);

I think this can be prevented by adding a foreign key, something like this:

$binary->addForeignKeyConstraint('phpcr_nodes', ['node_id'], ['id'], ['onDelete' => 'CASCADE']);
ElectricMaxxx commented 6 years ago

Mhhh. Is the id an extra column if type integer to to get a dbal record complete? Or is it our well known path? First would mean, that we do not have any access from our app on it.

holtkamp commented 6 years ago

uhm... the phpcr_nodes.id column is the existing the primary key

https://github.com/jackalope/jackalope-doctrine-dbal/blob/d810fd88237d7537c431c38005150db1c1f9d8a3/src/Jackalope/Transport/DoctrineDBAL/RepositorySchema.php#L93

First would mean, that we do not have any access from our app on it.

Not sure I understand what you mean. This is pure at SQL level, right? So the "app" does not even need access?

Same as done here:

https://github.com/jackalope/jackalope-doctrine-dbal/blob/d810fd88237d7537c431c38005150db1c1f9d8a3/src/Jackalope/Transport/DoctrineDBAL/RepositorySchema.php#L160-L161

ElectricMaxxx commented 6 years ago

But even when the app (user of implementation,) does not use it, cause it depend on an Interface only, the implementation could need at least a key on important columns. And maybe some forreign keys. But that depends on the usecase. If noone uses a key, or a link betweem tanbles, that key is useless. So we should collect usecases for stuff. If we are good, then we have mor clear view

holtkamp commented 6 years ago

My suggestion does not change anything to the existing functionality. Keys, etc.

The only thing it does:

This way, the database itself ensures that phpcr_binarydata only contains data that is actually used...

dbu commented 6 years ago

its been quite a while when we built that... i can't remember if there was any specific reason not to do foreign keys for this. but if the node table and binarydata table use the same node id and do not somehow try to optimize by sharing binary data, i think it would make a lot of sense to add the foreign key functionality. the tests probably focus on creating and modifying data, and not on removing it :-/

do you want to look into this? i guess the complete fix would be:

holtkamp commented 6 years ago

@dbu sounds like a nice structured approach, kind of busy coming time, but will try to contribute.

For the eager folks, the following can already help:

DELETE
FROM phpcr_binarydata 
WHERE node_id NOT IN (SELECT id FROM phpcr_nodes);

ALTER TABLE phpcr_binarydata ADD CONSTRAINT FK_37E65615460D9FD7 FOREIGN KEY (node_id) REFERENCES phpcr_nodes (id) ON DELETE CASCADE;
ElectricMaxxx commented 6 years ago

Btw: i did not want to reject the key. I just wanted to say that the keys can change application, especially for cascade options.

dbu commented 6 years ago

thanks for that update. yeah, that does not look too scary. glad if you can do a PR to adjust the db init script and wrap those 2 statements into a changelog entry. no hurry of course, we are all busy people ;-)