planetscale / cli

The CLI for PlanetScale Database
https://planetscale.com/cli
Apache License 2.0
604 stars 51 forks source link

Add optional progress information and other enhancements for restore-dump command #910

Open orware opened 1 month ago

orware commented 1 month ago

These changes add a few potentially useful capabilities to the restore-dump command.

The first is the --show-details flag, which helps provide a middle ground between the minimal default "Restoring database ..." output and the potentially too noisy --debug output, since that ends up outputting the large INSERT queries being executed as well to the terminal. This flag will output the names of the files that will be included within the restore folder and will also show progress processing the queries within the various data files.

The --start-from flag allows you to provide a table name which will then be used to skip earlier tables so that the restore will begin from the table name that is provided which can be helpful to avoid having to start a restore completely from the very first table again. (Replaced by the --starting-table and --ending-table options described in the later comment down below.)

The --allow-different-destination flag addresses https://github.com/planetscale/cli/issues/540 and adds extra flexibility since users would not be required to adjust the database name prefix embedded into the files within a dump folder for a different database and can simply enable this option and it will allow those files to be utilized for a database with a different name.

The --schema-only and --data-only flags can help address situations where you may have a folder containing both sets of files but only need one type. While a new folder containing only the files needed is an option this mainly helps add extra flexibility without needing to take that extra step.

Example output:

The following two examples use an example database name of metrics into its main branch.

The first example shows using the --schema-only flag:

pscale database restore-dump metrics main --dir="<DIRECTORY_NAME>" --org=<ORG> --overwrite-tables --threads=2 --schema-only --show-details

Starting to restore database metrics from folder <DIRECTORY_NAME>
Restoring database ...
The schema only option is enabled for this restore.
Collecting files from folder <DIRECTORY_NAME>
  |- Table file: metrics.connections-schema.sql
    |- Data file: metrics.connections.00001.sql
    |- Data file: metrics.connections.00002.sql
    |- Data file: metrics.connections.00003.sql
    |- Data file: metrics.connections.00004.sql
    |- Data file: metrics.connections.00005.sql
  |- Table file: metrics.placeholder-schema.sql
Dropping Existing Table (if it exists): `metrics`.`connections`
Creating Table: `metrics`.`connections` (Table 1 of 2)
Dropping Existing Table (if it exists): `metrics`.`placeholder`
Creating Table: `metrics`.`placeholder` (Table 2 of 2)
Skipping restoring data files...
Restore is finished! (elapsed time: 1.189019208s)

The second example demonstrates how using --start-from prevents files from the earlier connections table from being included at all and allows things to start from the provided table name which happens to be called placeholder:

pscale database restore-dump metrics main --dir="<DIRECTORY_NAME>" --org=<ORG> --overwrite-tables --threads=2 --schema-only --show-details --start-from="placeholder"

Starting to restore database metrics from folder <DIRECTORY_NAME>
Restoring database ...
The schema only option is enabled for this restore.
Collecting files from folder <DIRECTORY_NAME>
Skipping files associated with the connections table...
  |- Table file: metrics.placeholder-schema.sql
Starting from placeholder table...
Dropping Existing Table (if it exists): `metrics`.`placeholder`
Creating Table: `metrics`.`placeholder` (Table 1 of 1)
Skipping restoring data files...
Restore is finished! (elapsed time: 562.755959ms)

To demonstrate the usage of --allow-different-destination first let's see the default behavior when attempting to restore a dump folder, created for the metrics database, into a database named not-metrics:

pscale database restore-dump not-metrics main --dir="<DIRECTORY_NAME>" --org=<ORG> --overwrite-tables --threads=2 --schema-only --show-details --start-from="placeholder"

Starting to restore database not-metrics from folder <DIRECTORY_NAME>
Restoring database ...
The schema only option is enabled for this restore.
Collecting files from folder <DIRECTORY_NAME>
Skipping files associated with the connections table...
  |- Table file: metrics.placeholder-schema.sql
Starting from placeholder table...
Error: failed to restore database: VT05003: unknown database 'metrics' in vschema (errno 1105) (sqlstate HY000)

As expected, the restore is unable to complete since the filenames within the dump folder have the embedded metrics name which doesn't match with the not-metrics name being passed into the command.

But if we add the new --allow-different-destination flag in, this allows the restore to proceed into the differently named destination database.

pscale database restore-dump not-metrics main --dir="<DIRECTORY_NAME>" --org=<ORG> --overwrite-tables --threads=2 --schema-only --show-details --start-from="placeholder" --allow-different-destination

Starting to restore database not-metrics from folder <DIRECTORY_NAME>
Restoring database ...
The allow different destination option is enabled for this restore.
Files that do not begin with the provided database name of not-metrics will still be processed without having to rename them first.
The schema only option is enabled for this restore.
Collecting files from folder <DIRECTORY_NAME>
Skipping files associated with the connections table...
  |- Table file: metrics.placeholder-schema.sql
Starting from placeholder table...
Dropping Existing Table (if it exists): `not-metrics`.`placeholder`
Creating Table: `not-metrics`.`placeholder` (Table 1 of 1)
Skipping restoring data files...
Restore is finished! (elapsed time: 543.931875ms)

I didn't want to include a lot of lines showing data files being processed but below is a snippet showing the extra details that would be included towards the end of processing a file:

  Processing Query 126 out of 134 within metrics.connections.00005.sql in thread 1
  Processing Query 127 out of 134 within metrics.connections.00005.sql in thread 1
  Processing Query 128 out of 134 within metrics.connections.00005.sql in thread 1
  Processing Query 129 out of 134 within metrics.connections.00005.sql in thread 1
  Processing Query 130 out of 134 within metrics.connections.00005.sql in thread 1
  Processing Query 131 out of 134 within metrics.connections.00005.sql in thread 1
  Processing Query 132 out of 134 within metrics.connections.00005.sql in thread 1
  Processing Query 133 out of 134 within metrics.connections.00005.sql in thread 1
  Processing Query 134 out of 134 within metrics.connections.00005.sql in thread 1
Finished Processing Data File: metrics.connections.00005.sql in 2m45.760835334s with 6m20.652604042s elapsed so far (File 5 of 5)
orware commented 1 month ago

I was thinking about this some more and it makes sense to allow for this option to be more comprehensive and allow not just those tables >= to a starting table to be included but also the tables <= to an ending table too as well as being able to provide a more specific starting/ending inclusive table range so I made some adjustments.

The last commit removes the --start-from option and replaces it with the --starting-table and --ending-table options which can be used separately or together.

Examples:

If not using the options at all then all tables should be included as they normally would. If the ending table provided comes before the starting table alphabetically then the restore will short circuit until that is corrected.

This example skips the last two tables since the table range being provided will only include the first two:

pscale database restore-dump metrics main --dir="<DIRECTORY_NAME>" --org=<ORG> --overwrite-tables --threads=2 --schema-only --show-details --starting-table="abc" --ending-table="bcd"

Starting to restore database metrics from folder <DIRECTORY_NAME>
Restoring database ...
The schema only option is enabled for this restore.
Collecting files from folder <DIRECTORY_NAME>
  |- Table file: metrics.abc-schema.sql
  |- Table file: metrics.bcd-schema.sql
Skipping files associated with the connections table...
Skipping files associated with the placeholder table...
Restore will be starting from the abc table...
Restore will be ending at the bcd table...
Dropping Existing Table (if it exists): `metrics`.`abc`
Creating Table: `metrics`.`abc` (Table 1 of 2)
Dropping Existing Table (if it exists): `metrics`.`bcd`
Creating Table: `metrics`.`bcd` (Table 2 of 2)
Skipping restoring data files...
Restore is finished! (elapsed time: 1.005391166s)

For the case above, running the same command without the use of --starting-table would result in the same output since the abc table is the only other one before bcd:

pscale database restore-dump metrics main --dir="<DIRECTORY_NAME>" --org=<ORG> --overwrite-tables --threads=2 --schema-only --show-details --ending-table="bcd"

The opposite however, starting from the bcd table and continuing to the end will include the remaining tables automatically:

pscale database restore-dump metrics main --dir="<DIRECTORY_NAME>" --org=<ORG> --overwrite-tables --threads=2 --schema-only --show-details --starting-table="bcd"

Starting to restore database metrics from folder <DIRECTORY_NAME>
Restoring database ...
The schema only option is enabled for this restore.
Collecting files from folder <DIRECTORY_NAME>
Skipping files associated with the abc table...
  |- Table file: metrics.bcd-schema.sql
  |- Table file: metrics.connections-schema.sql
    |- Data file: metrics.connections.00001.sql
    |- Data file: metrics.connections.00002.sql
    |- Data file: metrics.connections.00003.sql
    |- Data file: metrics.connections.00004.sql
    |- Data file: metrics.connections.00005.sql
  |- Table file: metrics.placeholder-schema.sql
Restore will be starting from the bcd table...
Dropping Existing Table (if it exists): `metrics`.`bcd`
Creating Table: `metrics`.`bcd` (Table 1 of 3)
Dropping Existing Table (if it exists): `metrics`.`connections`
Creating Table: `metrics`.`connections` (Table 2 of 3)
Dropping Existing Table (if it exists): `metrics`.`placeholder`
Creating Table: `metrics`.`placeholder` (Table 3 of 3)
Skipping restoring data files...
Restore is finished! (elapsed time: 1.321406s)
orware commented 2 weeks ago

I had worked with the user I originally created a lot of these options for on their Postgres migration a few weekends ago but I hadn't confirmed whether they had been able to utilize the custom build including these features during the migration itself.

They were able to confirm that earlier today and shared: "...we couldn’t have completed the migration without it. It was super helpful to be able to see progress as before it was a complete black box".