neos / neos-development-collection

The unified repository containing the Neos core packages, used for Neos development.
https://www.neos.io/
GNU General Public License v3.0
260 stars 221 forks source link

Unique constraint on nodedata not as unique as assumed #1908

Open kdambekalns opened 6 years ago

kdambekalns commented 6 years ago

Description

The table neos_contentrepository_domain_model_nodedata has a unique constraint:

UNIQUE KEY UNIQ_CE651569772E836A8D94001992F8FB012D45FE4D (identifier, workspace, dimensionshash, movedto)

It is intended to guard against a broken DB state where the same (moved) node exists "twice". The same index exists in our PostgreSQL schema.

Steps to Reproduce

For MySQL:

INSERT INTO neos_contentrepository_domain_model_nodedata (persistence_object_identifier, identifier, workspace, pathhash, dimensionshash, movedto) VALUES('some-persistence-identifier', 'some-identifier', 'live', 'some-pathhash', 'some-dimensionshash', NULL);
INSERT INTO neos_contentrepository_domain_model_nodedata (persistence_object_identifier, identifier, workspace, pathhash, dimensionshash, movedto) VALUES('other-persistence-identifier', 'some-identifier', 'live', 'other-pathhash', 'some-dimensionshash', NULL);

For PostgreSQL:

INSERT INTO neos_contentrepository_domain_model_nodedata (persistence_object_identifier, identifier, workspace, pathhash, dimensionshash, movedto, path, parentpath, nodetype, removed, hidden, hiddeninindex, parentpathhash, creationdatetime, lastmodificationdatetime, properties, dimensionvalues, accessroles) VALUES('some-persistence-identifier', 'some-identifier', 'live', 'some-pathhash', 'some-dimensionshash', NULL, 'a-path', 'a-parentpath', 'nodetype', false, false, false, 'pp-hash', now(), now(), '{}', '{}', '{}');
INSERT INTO neos_contentrepository_domain_model_nodedata (persistence_object_identifier, identifier, workspace, pathhash, dimensionshash, movedto, path, parentpath, nodetype, removed, hidden, hiddeninindex, parentpathhash, creationdatetime, lastmodificationdatetime, properties, dimensionvalues, accessroles) VALUES('other-persistence-identifier', 'some-identifier', 'live', 'other-pathhash', 'some-dimensionshash', NULL, 'a-path', 'a-parentpath', 'nodetype', false, false, false, 'pp-hash', now(), now(), '{}', '{}', '{}');

Expected behavior

The constraint blocks insertion of the second row.

Actual behavior

Both rows can be inserted.

This is by design, as SQL considers all NULL values as distinct. Or rather, since NULL means "non-existing, unknown, absent" something that is not, cannot be equal to something else, even that too is not. A longer and better explanation can be found at https://www.xaprb.com/blog/2006/05/18/why-null-never-compares-false-to-anything-in-sql/

Effect

This in itself does not cause any errors, but it allow a state that should be valid. That state in turn can lead to very confusing erros during node publishing. E.g. a node from a workspace you are not even touching leading to a constraint violation exception…

SQL to detect and fix that state can be found in https://gist.github.com/kdambekalns/84daac693b1a745485a6de08f46b736d

Solution

Using some value instead of NULL in movedto to signify "has not been moved" is not possible, since the field references another node and a foreign key constraint is in place.

Adding another boolean field "isMoved" seems to be a viable solution. In any non-broken DB setting the needed values in a schema migration should be no problem, and that new field could be added to the constraint to effectively block what should have been blocked. Is that correct?

Affected Versions

Neos: All version having that index… on MySQL

kdambekalns commented 6 years ago

The unique constraint needs to use the new isMoved flag but not the movedTo, as that would still be unique in any case, obviously.

The question now is, can we just add an unique index on movedTo alone? My line of thought would be yes, since no two move operations can ever generate the same node including persistence_object_identifier twice.

And there is another interesting issue regarding movedTo–it is annotated as ManyToOne since https://github.com/neos/neos-development-collection/commit/b98502908a6abce318c2f4f0bdb99a95e8579db7#diff-eb03e62c932a1bc63c64d4db4c155b14R183, but IMHO a OneToOne would be correct.

If someone (maybe even @hlubek himself) remembers the background for that, I'd be a happy camper.

kdambekalns commented 6 years ago

Unit tests pass, but functional and Behat tests fail, due to the oder of operations done when moving nodes. And that seems like a bigger task to solve…

kdambekalns commented 6 years ago

Playing around I could

So it seems the handling of moves is in fact working as long as at least one base workspace is involved. The tests that fail work on a workspace without a base workspace.

kdambekalns commented 6 years ago

Locally the unit and functional tests did now pass, let's see what Travis says…

kdambekalns commented 6 years ago

The Behat tests fail, and I am again stuck with a conceptual issue… During the test scenario "Rename a node with updated property (materialized node data) to a previously moved path in user workspace" this is the state

zustand

that leads to a unique key constraint violation with this query:

UPDATE neos_contentrepository_domain_model_nodedata SET removed = 1, 
ismoved = 1, movedto = '27d4f384-4e4e-4a60-af4b-5e1d79143eb4',
lastmodificationdatetime = '2018-03-07 09:48:25', version = version + 1 WHERE
persistence_object_identifier = '9b2f0d78-76d5-4818-adc3-9e6151491def' AND version = 1;

The "old" unique key would have allowed for this, since the movedto value is different. But that's what we want to get out of the equation to begin with.

This might be related, but resetting the moved state does not work either:

da-problem-with-behat
kdambekalns commented 6 years ago

Also, the tests from MultiLayeredWorkspaces.feature added in https://github.com/neos/neos-development-collection/pull/1609 fail now.

kdambekalns commented 6 years ago

The bottom line so far is: Having a unique index that involves movedTo is needed, since we do keep "chained moved nodes". Still, there should always be at most one node with movedTo being null in a identifier/workspace/dimensions combination. That is what we should make sure.

Further ideas for a solution are welcome.

nezaniel commented 6 years ago

My thoughts after discussion with @kitsunet on this: Because of the whole ShadowNode concept, movedTo is part of a NodeData's identity and thus must not be NULL. Whether we remove the FK constraint and set it to empty string or we default it to self (i.e. the NodeData's own path) is open for discussion. Unfortunately, it seems not to be possible to save such a moved copy because it is first saved and then changed, which somehow™ has to be adjusted. The current moving of nodes seems to only work at all because said constraint does not.

hlubek commented 6 years ago

Sounds reasonable, maybe we need to use direct DBAL updates here to solve this outside of the ORM.

nezaniel commented 6 years ago

I'm pretty sure ORM can handle this too, although I also personally prefer DBAL right now

peterbucher commented 5 years ago

@kdambekalns Any updates about this issue?