lolli42 / dbdoctor

TYPO3 CLI extension to find and fix DB inconsistencies
GNU General Public License v2.0
37 stars 6 forks source link

tests core v11 tests core v12

TYPO3 DB doctor

Mission

The mission of this extension is to find database inconsistencies that may have been introduced in a living TYPO3 instance over time, and to fix them.

As example, when a page tree is deleted by an editor, it sometimes happens that most pages are properly set to deleted, but some pages are missed, or a content element on one page is not deleted. This leads to orphan pages or content elements in the database.

There can be many reasons to end up with invalid database state like the above: TYPO3 in general has no referential integrity constrains on database tables, inconsistencies can be triggered by a dying PHP process, a lost DB connection, a core bug, a buggy extension, a broken deployment, and more. Long living active instances that were upgraded through multiple major core versions tend to end up with something that isn't quite right anymore.

Such inconsistencies can lead to further issues. For instance if a page is copied that has an orphaned localized record, the system tends to mess up localizations of the copied page, too. Editors then stumble and TYPO3 agencies have to do time-consuming debugging sessions to find out what went wrong.

This extension provides a CLI command that tries to find various such inconsistencies and gives admins options to fix them.

Alternatives

We're not aware of other open extensions that try to achieve the same in a similar systematic way. The core lowlevel extension comes with a couple of commands that try to clean up various DB state, but its codebase is rather rotten and hard to maintain.

This extension is not a substitution of lowlevel commands (yet?), it's more an incubator to see if a certain strategy dealing with inconsistencies actually works out in projects. It will grow over time. Maybe it ends up in the core, or the core refers to this extension as "maintenance" extensions in the future. We'll see.

Strategy

The strategy of this command is to check for single things one-at-a-time and to fix them before going to the next check. Updates and deletes of not-ok records are done with low-level database queries directly, not using the DataHandler.

Single checks are carefully crafted and functional tested and the order in which they are executed is important. It can happen that a single check is run multiple times in the chain.

Single checks rather try to avoid memory consumption and assumed state at the cost of more queries being executed. Queries are often performed as prepared statements to re-use them often in a single check. Statements are properly closed when a single check finished, effectively using the PHP garbage collection. All-in-all, this command should be relatively quick even for big-sized instances, but it will hammer the database a lot.

Impact on Frontend rendering

When a health check finds something fishy, dbdoctor allows only one hard coded solution to deal with it. The user is not asked for a solution, it either accepts the proposed UPDATE or DELETE database changes, or it needs to abort and take care manually (and then restart).

Implementing a per-record question/answer feature to dbdoctor is not feasible since this would add an orthogonal vector of complexity to the system which would quickly render it unmaintainable: Single checks are designed to work on top of each other, dbdoctor needs to establish a "chain of correctness" to do its job.

There are usually three options for a specific "fix":

The general strategy is to create as little damage as possible from a TYPO3 Frontend rending point of view.

For example, when there are two localizations for a default language record in a specific language, dbdoctor detects this as invalid and suggests to set one of them to deleted=1. From the two records, it will try to set the one deleted that is typically not rendered in Frontend.

This general strategy isn't always as simple as with the above example, tough: Since the TYPO3 Frontend rendering is so flexible, the actual rendered record sometimes depends on specific Frontend rendering details dbdoctor can't know. In those cases, dbdoctor tries to guess the least amount of damage. This may not always fit real life cases. The only solution to deal with this is to look at single record change suggestions individually. The interactive options p, d and s hopefully help to classify single suggested changes.

Limits

Even though this low level tool tries to be very careful and checks lots of details before suggesting a change, there are still some limits and assumptions: For example, the "delete" column of soft-delete aware TCA tables is assumed to be an integer column, and not a text or varchar or similar. The correct schema of this column is usually created by the core as long as there is no explicit definition of it in a ext_tables.sql file. However, if an extension gets this wrong and defines such a field in some broken way, dbdoctor may create hazard by suggesting delete or updates of all rows.

There are further assumptions: For instance, dbdoctor assumes some TCA settings the core provides for standard tables (especially pages, tt_content and sys_file_reference are not changed by extensions. As example, those tables are assumed to be both soft-delete aware and workspace aware, according fields are queried by dbdoctor on such tables, and dbdoctor will fail if an extension tampered with according TCA ctrl settings.

There are further scenarios dbdoctor can not deal with: For example, let's say some extension declares a table soft-delete-aware by having a TCA entry ['ctrl']['delete'] = 'deleted', and you have some rows that are deleted=1. Later, that TCA table is set to be no longer soft-delete-aware by removing the ['ctrl']['delete'] declaration. The core database analyzer will then suggest to first rename the deleted column to zzz_deleted_deleted, and will then allow to remove the column. Doing this will effectively push all previously deleted records "live", when you missed to remove all affected deleted=1 records beforehand. There are similar scenarios when TCA tables are changed to be no longer workspace-aware, but you still have workspace related records in the table, or when TCA tables are no longer "starttime" / "endtime" aware with having timed records it the table.

dbdoctor always works on the current TCA state. It never knows if some TCA table has been defined "soft-delete-aware" before, and if this has been changed later. When you push records live by removing the "deleted" column, by removing the "workspaces" extension, workspaces related columns, or timing related fields, this can end up with non-repairable state dbdoctor will not be able to fix. Instead, it will tend to find additional database relations that are broken, and will suggest changes that make the situation worse than before. Also, dbdoctor never looks at potentially existing zzz_deleted columns - those do not exist from dbdoctor point of view since they depend on some "before" TCA state that can not be reconstructed again. State created from scenarios like the above ones are not repairable and need manual reconstruction. Good luck.

All in all, TCA and ext_tables.sql of extensions should be in a good shape before working with dbdoctor, and changes suggested by health checks should always be checked manually before committing them to the database. Also, never forget to back up the database to prepare for an eventually needed disaster recovery. Do not accept dbdoctor suggestions blindly!

Current status

First releases have been done, but we're not confident enough to have a 1.0.0, yet. The nature of this extension is to perform potential malicious queries, so use the system with care. We are however using this extension for some of our customers with success already.

Installation

Composer

The extension currently supports TYPO3 v11 and TYPO3 v12. The extension can be installed as non-dev dependency (not adding --dev to composer require): It has no impact on a live instance (except dependency injection definitions) as long as it is not actively executed via CLI.

$ composer require lolli/dbdoctor

TYPO3 Extension Repository

For non-composer projects, the extension is available in TER as extension key dbdoctor and can be installed using the extension manager.

Preparation

The nature of the CLI command is to perform destructive database operations on your instance. As such, a few things should be kept in mind:

Postprocessing

Usage

$ bin/typo3 dbdoctor:health

Note dbdoctor is "runtime static" with TCA: When dbdoctor is running, TCA is not expected to change meanwhile. When you are looking at single changes and decide to change TCA, then clear all caches and abort dbdoctor (press "a" in interactive mode) to start again. Failing to do so may lead to dbdoctor committing hazard to the database, depending on what you did with TCA.

The interface looks like this:

Note the above image is notoriously outdated, the interface of the current version may look slightly different. We're too lazy to update the image often, but it should give a solid idea on how the interface looks like.

The main command is a chain of single checks. They are done one by one. Affected record details can be shown on a per-page and a per-record basis to give a quick overview. The interface allows deleting or updating of affected records, depending on the type of the check.

The default interactive mode will never perform updates automatically and always asks the user for actions. When pressing 's' (simulate/show), the queries that would be performed are shown, when pressing 'e' (execute), the queries are actually executed.

Interactive mode

When dbdoctor finds something to fix in (default) interactive mode, execution stops and waits for user input:

Exit values

Exit values are bit masks: Integer 3 means: "Changes needed or done" AND "User abort"

Options

The CLI command can be executed with a couple of options. The default mode is "interactive", prompting for user input after each failed check.

Current health checks

Single tests are described in details when running the CLI command. Rough overview:

Further hints

We highly encourage admins to back up databases when working with dbdoctor. Some basic rules regarding SQL dumps must not be forgotten when doing this:

FAQ

Tagging and releasing

packagist.org is enabled via the casual github hook. TER releases are created by the "publish.yml" github workflow when tagging versions using tailor. The commit message of the tagged commit is used as TER upload comment.

Example:

Build/Scripts/runTests.sh -s clean
Build/Scripts/runTests.sh -s composerUpdate
composer req --dev typo3/tailor
.Build/bin/tailor set-version 0.3.2
composer rem --dev typo3/tailor
git commit -am "[RELEASE] 0.3.2 Added some basic inline foreign field related checks"
git tag 0.3.2
git push
git push --tags