webiny / webiny-js

Open-source serverless enterprise CMS. Includes a headless CMS, page builder, form builder, and file manager. Easy to customize and expand. Deploys to AWS.
https://www.webiny.com
Other
7.22k stars 589 forks source link

Improving 5.39.6 Meta Fields Data Migration Performance #4154

Closed adrians5j closed 1 month ago

adrians5j commented 1 month ago

Changes

The main goal with this PR was to improve the existing 5.39.0 meta fields data migration, by separating work that needs to be done into multiple parallel tasks.

Note that the new data migration does not run within our existing data migrations framework, nor does it run inside of AWS Lambda functions.

How To Run?

There are two ways to run the new data migration, but the first step is aways running the yarn add @webiny/migrations reflect-metadata command in your project root.

Also, whatever the approach you end up using, note that these four environment variables have to be set upon running the data migration:

AWS_REGION
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN

1. Terminal Command

The migration can be ran via the terminal like any other command, for example:

AWS_REGION=eu-central-1 aws-vault exec default -- node node_modules/@webiny/migrations/migrations/5.39.6/001/ddb-es/bin.js \
--ddbTable wby-webiny-abc123 \
--ddbEsTable wby-webiny-es-abc123 \
--esEndpoint search-wby-webiny-js-abc123-xyz123abc.eu-central-1.es.amazonaws.com \
--segments 10

Note that we're using the aws-vault tool to easily inject temporary credentials / all of the required AWS_ environment variables before running the data migration.

2. Post Deploy Webiny CLI Hook

The migration can also be run by using the post-deploy CLI plugin that the migration also exports:

// webiny.project.ts
import cliWorkspaces from "@webiny/cli-plugin-workspaces";
import cliPulumiDeploy from "@webiny/cli-plugin-deploy-pulumi";
import cliAwsTemplate from "@webiny/cwp-template-aws/cli";

// Scaffolds.
import cliScaffold from "@webiny/cli-plugin-scaffold";
import cliScaffoldExtendGraphQlApi from "@webiny/cli-plugin-scaffold-graphql-service";
import cliScaffoldAdminModule from "@webiny/cli-plugin-scaffold-admin-app-module";
import cliScaffoldCiCd from "@webiny/cli-plugin-scaffold-ci";
import { createMetaFieldsDataMigrationDeploymentHook } from "@webiny/migrations/migrations/5.39.6/001/ddb-es/createMetaFieldsDataMigrationDeploymentHook";
import { ElasticsearchCatClusterHealthStatus } from "@webiny/api-elasticsearch";

export default {
  appAliases: {
    core: "apps/core",
    api: "apps/api",
    admin: "apps/admin",
    website: "apps/website"
  },
  featureFlags: {
    allowCmsLegacyRichTextInput: true
  },
  template: "@webiny/cwp-template-aws@5.33.2",
  name: "webiny-cms",
  cli: {
    plugins: [
      cliWorkspaces(),
      cliPulumiDeploy(),
      cliAwsTemplate(),

      // Scaffolds.
      cliScaffold(),
      cliScaffoldExtendGraphQlApi(),
      cliScaffoldAdminModule(),
      cliScaffoldCiCd(),

      createMetaFieldsDataMigrationDeploymentHook({
        totalSegments: 15,

        // Using default values here. This is just to show you how you can customize these values.
        esHealthChecks: {
          minClusterHealthStatus: ElasticsearchCatClusterHealthStatus.Yellow,
          maxProcessorPercent: 90,
          maxRamPercent: 100,
          maxWaitingTime: 90,
          waitingTimeStep: 2
        }
      })
    ]
  }
};

Which Approach To Use?

Using the deployment hook is a bit easier because not only Webiny CLI ensures that the migration is run, but also, the user does not have to pass DDB table names and ES domain name.

But this does assume that the data migration will be run in the same workflow / on the same machine on which the deployment of the API project application has been initiated. If the machine is not powerful enough and it's not possible to increase its specs, then running the migration in a separate EC2 instance might be a better idea, which then brings us to the first approach.

What Kind of a Machine Should I Be Using?

We recommend using at least a machine with 16 CPUs. 32 CPUs for larger data sets would be preferable.

Benchmark

We've tested the new data migration with ~1GB of data, with a 32 CPU / 128GB EC2 instance and number of segments set to 15. The migration finished in ~100seconds, with all of the data fully migrated.

The main factor that affects the duration of the data migration is the number of segments, or in other word, into how many parallel jobs we want the data migration to be split. The more CPUs available, the bigger the segments count we can set, ultimately meaning the shorter duration of the migration.

But of course, increasing the number of segments can also make you hit downstream limits, like the amount of writes you can do towards DynamoDB. Elasticsearch can also enter in an unhealthy state, meaning the data migration will start to wait until it gets healthy again.

After performing some testing, setting the number of segments showed good enough results. If needed, further testing can be performed and maybe even better results could be achieved, but we felt that wasn't required.

What about the existing 5.39.6-001 migration? Will the 5.39.6-001 migration still be run?

The existing 5.39.6-001 data migration can still be run, and it'll still be the default migration that will be run for users, if they do not choose to go with the new migration introduced with this PR. But do note that if a user chooses to use it, once completed, both the existing 5.39.6-001 data migration and the new improved one won't be run anymore.

Once the Data Migration Has Been Run, What Next?

If you were running the data migration via the terminal command, then no further actions are needed.

If you were running the data migration via the post deploy Webiny CLI hook, then it's best to remove the plugin once the migration has been concluded. Not removing will not cause any issues, but just to make things a bit more cleaner, it's maybe better to just remove it.

How Has This Been Tested?

Manually.

Documentation

This PR body, plus changelog.