govCMS / lagoon

GovCMS Lagoon project
16 stars 28 forks source link

Config must validate, or the build/deploy should fail #39

Open simesy opened 5 years ago

simesy commented 5 years ago

As a developer I don't want to deploy my code if configuration fails to validate.

This is because content (which I can't control in my deployment) can cause config sync to fail. In the case below I can't delete the content type because there exists "Speech" content.

 [error]  Drupal\Core\Config\ConfigImporterException: There were errors validating the config synchronization.
Entities exist of type <em class="placeholder">Content</em> and <em class="placeholder">Content type</em> <em class="placeholder">Speech</em>. These entities need to be deleted before importing. in Drupal\Core\Config\ConfigImporter->validate() (line 737 of /app/web/core/lib/Drupal/Core/Config/ConfigImporter.php). 

In ConfigImportCommands.php line 261:

  The import failed due to the following reasons:                              
  Entities exist of type <em class="placeholder">Content</em> and <em class="  
  placeholder">Content type</em> <em class="placeholder">Speech</em>. These e  
  ntities need to be deleted before importing.   

The main reason this is a problem is when config is being deleted. Say in my deployment I'm creating a field or content type, and I have code in my theme that is expecting the field or content type to be there. My code fails and potentially causes a white screen for the whole site.

tobybellwood commented 5 years ago

the same error was picked up at the CI Build stage, and still passed. It'd be safest to fail (if failing was desirable) in CI.

I can see both sides - it could be beneficial to ignore some cim errors, but what is the expectation on the user's config mgmt knowledge here?

simesy commented 5 years ago

Any config validation errors that rise from drush cim means that you can't sync any config, full stop - it's all or nothing.

So the best-case scenario, deploying such code will have zero effect - a few hours later you'll get a service desk question asking why the config change didn't deploy. At worst, the site goes down.

The question is probably "what error can we show that would be useful". If you can get the drush error in front of the user that's probably the most useful.

The import failed due to the following reason: Entities exist of type Content and Content type Speech. These entities need to be deleted before importing.

That's pretty informative.

simesy commented 5 years ago

Yes I think build is the right place to fail. But it should fail on the deploy too because of race-conditions with content changes happening on prod between the build and deploy steps.

tobybellwood commented 5 years ago

the process is

  1. Gitlab CI: Build (runs govcms-deploy against recent db backup)
  2. Gitlab CI: Deploy
  3. Lagoon Build (runs govcms-deploy against db)
  4. Lagoon Deploy.

A "failure" at any step halts the process. Stopping at 1) would be the safest, and the easiest to view errors (the actual drush error would be visible in GitLab CI). We may even be able to provide a config export on failure as a build artefact to download to help diagnose?

simesy commented 5 years ago

aha. agreed.

Number 1. for visibility and step most likely to fail

Number 4. for "just in time" race conditions to be super safe....

Edit - the editor turned 4. into 2.