influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.94k stars 3.55k forks source link

In-place OSS 1.x to 2.x upgrade #19308

Closed stuartcarnie closed 4 years ago

stuartcarnie commented 4 years ago

What

Provide tooling to upgrade a user running InfluxDB 1.7 / 1.8 to InfluxDB 2.0.

Requirements

Upgrade Steps

It is assumed that the user has progressed through the upgrade to the point their org, user and other required metadata has been created in the local Bolt database.

The tool expects to be pointed to an existing InfluxDB 1.x directory (the "source directory") where the meta.db and time-series data is stored.

Design

The layout described in the following sections was discussed at length with Paul Dix, Edd and others.

On Disk Structure

Every bucket will have its own database and retention policy. To state it another way, every database will have exactly one retention policy, which is the bucket.

Creating two buckets, bucket-a and bucket-b would result in the following:

data/
  bucket-a/
    autogen/
  bucket-b/
    autogen/

NOTE: The bucket metadata will be separate from the TSM 1 metadata (database name, retention policy name, shards, etc). As is with 1.x, the database name and retention policy names are immutable. Other metadata, such as shard duration may be exposed via subcommands of the influx CLI tool.

Migration

Given the above property, if the user has the following existing structure:

data/
    my-db-1/
        default/
        1year
    my-db-2/
        autogen/
        1year

The migration process will generate 4 buckets, one for each of the above database / retention policy pairs.

Imagine the migration process yields the following metadata for the 4 buckets:

Bucket ID Bucket Name (derived) DB RP
7425a44ac4110001 my-db-1-default my-db-1 default
7425a44ac4110002 my-db-1-1year my-db-1 1year
7425a44ac4110003 my-db-2-autogen my-db-2 autogen
7425a44ac4110004 my-db-2-1year my-db-2 1year

The resulting layout on disk will be:

data/
    7425a44ac4110001/
        autogen/
    7425a44ac4110002/
        autogen/
    7425a44ac4110003/
        autogen/
    7425a44ac4110004/
        autogen/
stuartcarnie commented 4 years ago

I expect this is > a single sprint as there will be iteration and testing to ensure the process is as smooth as possible

russorat commented 4 years ago

The is the process for upgrading users to 2.0:

If there are 0 or 1 users in the 1.x instance:

If there are >= 2 users in the 1.x instance:

Once the upgrade is complete, the admin will log into the new 2.0 instance, and either manually set up new users in the org (with their own username/password and tokens) or distribute the newly created token to the appropriate user (in the case that the 1.x user was only used as an integration).

examples: I am a 1.x user with 5 users (named a, b, c, d, e) configured in my 1.x instance, all with read/write access to all databases. When i run the upgrade process, my 2.0 instance contains 1 org, 1 admin user, 1 operator token, and a token for each user with the descriptions a, b, c, d, e, with r/w permissions on the buckets in 2.0 that correspond to the db/rp combos from 1.x.

I am a 1.x user with 2 users (named a, b) configured in my 1.x instance, where a has read/write access to all databases, and b has write access to a single database. When i run the upgrade process, my 2.0 instance contains 1 org, 1 admin user, 1 operator token, and a token for each user with the description a, b, where token a has with r/w permissions to all buckets in 2.0 and token b has write access to a single bucket.

vlastahajek commented 4 years ago

@russorat, do you mean by we will prompt the user upgrade should provide interactive input for parameters? And no cli options? Or both? Current options list:

  -m, --bolt-path string         path for boltdb database (default "/home/ubuntu/.influxdbv2/influxd.bolt")
  -b, --bucket string            primary bucket name
      --config-file string       optional: Custom InfluxDB 1.x config file path, else the default config file
  -e, --engine-path string       path for persistent engine files (default "/home/ubuntu/.influxdbv2/engine")
  -h, --help                     help for upgrade
      --log-path string          optional: custom log file path (default "/home/ubuntu/upgrade.log")
  -o, --org string               primary organization name
  -p, --password string          password for username
  -r, --retention string         optional: duration bucket will retain data. 0 is infinite. Default is 0.
      --security-script string   optional: generated security upgrade script path (default "/home/ubuntu/influxd-upgrade-security.sh")
  -t, --token string             optional: token for username, else auto-generated
  -u, --username string          primary username
      --v1-dir string            path to source 1.x db directory containing meta,data and wal sub-folders (default "/home/ubuntu/.influxdb")
  -v, --verbose                  verbose output (default true)
russorat commented 4 years ago

@vlastahajek i would imagine it would work similar to the influx setup command today. you can run influx setup with no parameters, and you are prompted to enter the required parameters. You can also provide them on the command line (influx setup -f -b telegraf -o influxdata -u russ -p something) which will continue in non-interactive mode with no confirmation prompts (-f).

 ✗ influx setup -h
Setup instance with initial user, org, bucket

Usage:
  influx setup [flags]
  influx setup [command]

Available Commands:
  user        Setup instance with user, org, bucket

Flags:
  -c, --active-config string   Config name to use for command; Maps to env var $INFLUX_ACTIVE_CONFIG
  -b, --bucket string          primary bucket name
      --configs-path string    Path to the influx CLI configurations; Maps to env var $INFLUX_CONFIGS_PATH (default "/Users/rsavage/.influxdbv2/configs")
  -f, --force                  skip confirmation prompt
  -h, --help                   Help for the setup command
      --hide-headers           Hide the table headers; defaults false; Maps to env var $INFLUX_HIDE_HEADERS
      --host string            HTTP address of InfluxDB; Maps to env var $INFLUX_HOST
      --json                   Output data as json; defaults false; Maps to env var $INFLUX_OUTPUT_JSON
  -n, --name string            config name, only required if you already have existing configs
  -o, --org string             primary organization name
  -p, --password string        password for username
  -r, --retention string       Duration bucket will retain data. 0 is infinite. Default is 0.
      --skip-verify            Skip TLS certificate chain and host name verification.
  -t, --token string           token for username, else auto-generated
  -u, --username string        primary username

Use "influx setup [command] --help" for more information about a command.