neos / neos-development-collection

The unified repository containing the Neos core packages, used for Neos development.
https://www.neos.io/
GNU General Public License v3.0
260 stars 221 forks source link

TASK: `ChangeProjection` should clean up orphaned change records #4998

Open grebaldi opened 5 months ago

grebaldi commented 5 months ago

Problem description

The ChangeProjection in Neos.Neos currently misses a lot of cases in which it should clean up after itself. This leads to many orphaned cr_*_p_neos_change records that do not serve any purpose after the workspace they're associated with has been published or discarded.

To reproduce this problem, @bwaidelich has offered a SQL query for counting changes grouped by contentstream state via cr_*_p_neos_change (I'll assume cr_default_p_neos_change from hereon out):

SELECT
  COUNT(*), cs.state
FROM
  cr_default_p_neos_change c
  JOIN cr_default_p_contentstream cs ON cs.contentStreamId = c.contentStreamId
GROUP BY cs.state;

[!NOTE] Any Neos 9 instance that is in use will already have orphaned change records. To start fresh, you will need to make sure that every workspace is up-to-date and no workspace contains any changes.

Then it'll be safe to run DELETE FROM cr_default_p_neos_change on your database, you start at a change count of 0. NEVER do this in a mission-critical environment. This is only safe in a development environment that tolerates data loss.

Now, here are the results of @bwaidelich's query for several subsequent operations (all issued by the same user):

  1. Creating a document node

    +----------+---------------------+
    | COUNT(*) | state               |
    +----------+---------------------+
    |        2 | IN_USE_BY_WORKSPACE |
    +----------+---------------------+

    We see 2 changes, one for the document node and one for its content collection. (This may vary obviously depending on the node type)

  2. Creating a content node in the content collection of that document node

    +----------+---------------------+
    | COUNT(*) | state               |
    +----------+---------------------+
    |        3 | IN_USE_BY_WORKSPACE |
    +----------+---------------------+

    We see 1 additional change for the creation of the content node. (Again, this may vary depending on the node type of the content node)

  3. Changing a property of that content node

    +----------+---------------------+
    | COUNT(*) | state               |
    +----------+---------------------+
    |        3 | IN_USE_BY_WORKSPACE |
    +----------+---------------------+

    Nothing has changed. The change record for the content node has been flagged with changed - that's all.

4a. Discarding all changes via the UI

+----------+------------------+
| COUNT(*) | state            |
+----------+------------------+
|        6 | NO_LONGER_IN_USE |
+----------+------------------+

Here it gets interesting: The change count has doubled, but all changes are now associated with content streams that are NO_LONGER_IN_USE.

4b. Discarding all changes via ./flow workspace:discard [workspace-name]

+----------+------------------+
| COUNT(*) | state            |
+----------+------------------+
|        3 | NO_LONGER_IN_USE |
+----------+------------------+

Something different happens here: Again all changes are associated with a content stream that is NO_LONGER_IN_USE. But the count remained at 3. This is because the two publishing mechanism use different commands under the hood. DiscardIndividualNodesFromWorkspace (which happens in 4a) creates an additional intermediate contentstream, which leads to the doubling of the change count.

4c. Publishing all changes via the UI

+----------+------------------+
| COUNT(*) | state            |
+----------+------------------+
|        3 | FORKED           |
|        6 | NO_LONGER_IN_USE |
+----------+------------------+

The same phenomenon as in 4a, but there are also 3 changes in a content stream that is FORKED. I'm at a loss as to why that happens, but I suppose it's a detail in the handling of PublishIndividualNodesFromWorkspace.

4d. Publishing all changes via ./flow workspace:publish [workspace-name]

+----------+------------------+
| COUNT(*) | state            |
+----------+------------------+
|        3 | NO_LONGER_IN_USE |
+----------+------------------+

Same as in 4b.

Expected behavior

Notes on solution

It will be easier to achieve the desired behavior, when the ChangeProjection is better able to track changes per Workspace rather than per ContentSteam.

This will require the changes proposed in https://github.com/neos/neos-development-collection/issues/4996.

grebaldi commented 5 months ago

This is probably a duplicate of https://github.com/neos/neos-development-collection/issues/4813