pingcap / tidb-lightning

This repository has been moved to https://github.com/pingcap/br
Apache License 2.0
143 stars 66 forks source link

Require less configuration #131

Open morgo opened 5 years ago

morgo commented 5 years ago

Feature Request

Is your feature request related to a problem? Please describe:

Currently lightning requires a lot of configuration:

I would like to find a way where it can be used with minimal configuration. This helps improve convenience/support notice users and experiments.

Describe the feature you'd like:

I am open to ideas on implementation:

Describe alternatives you've considered:

This is really only about improving convenience/usability for casual use cases, so there are many alternative implimentations. There are a lot of users that don't want to edit configuration files, but rather just have a tool setup and running with no effort.

Teachability, Documentation, Adoption, Optimization:

kennytm commented 5 years ago

Can lightning discover a tidb-host and tikv-importer host from PD? That might not work perfectly, since it won't know which one is closest. I think it could absolutely discover the pd host from TiDB though (making the configuration of PD optional).

Thanks, looks like we could find the PD address from http://tidb-ip:10080/settings

Importer isn't registered on PD though, so we can't use PD to discover Importer.

Could the mydumper data-source-dir just be moved from configuration, to argument $1 for tidb-lightning?

Yes

Could lightning be embedded/bundled as a TiDB plugin?

Can tikv-importer somehow become embedded too?

cc @jackysp?

A plugin is basically a *.so library, we could place the Lightning/Importer code inside the plugin, or make the plugin a front-end which controls Lightning/Importer on another machine.

Placing the code directly inside the plugin is the same as the "mixed deployment" strategy which we no longer recommend. Lightning/Importer are resource intensive programs and doing so may bring down the cluster due to using up all CPU and network bandwidth (making the cluster unresponsive).

+---------------+
| TiDB          |
| +-----------+ |
| | Lightning | |
| +-----------+ |  +------+
| | Importer  +----+ TiKV |
| +-----------+ |  +------+
+---------------+

Placing the code outside the plugin means the user still needs to deploy the two programs. The only difference is being able to execute the command as SQL vs command line. IMO this doesn't improve much usability 🙃.

+--------------+
| TiDB         |
| +----------+ | +-----------+
| | (plugin) +---+ Lightning |
| +----------+ | +-----+-----+
+--------------+       |
                 +-----+-----+  +------+
                 | Importer  +--+ TiKV |
                 +-----------+  +------+
morgo commented 5 years ago

Importer isn't registered on PD though, so we can't use PD to discover Importer.

.. but it could be? :-) So start tikv-importer with the address of a pd server. This is a similar request to https://github.com/pingcap/tidb/issues/6435

Placing the code directly inside the plugin is the same as the "mixed deployment" strategy which we no matter recommend. Lightning/Importer are resource intensive programs and doing so may bring down the cluster due to using up all CPU and network bandwidth (making the cluster unresponsive).

This is true in the case of a multi-tenant TiDB cluster, but in the common case of lightning, I think I would not be using the cluster until after the data has been restored. So resource saturation is not a problem.

jackysp commented 5 years ago

I think make lightning as a plugin of TiDB is a good idea. But if tikv-importer is also embedded, we may meet some CGO issues?

kennytm commented 5 years ago

@morgo

.. but it could be? :-) So start tikv-importer with the address of a pd server.

This means we also need to supply the PD address to tikv-importer which is also a configuration ;)

morgo commented 5 years ago

Placing the code outside the plugin means the user still needs to deploy the two programs. The only difference is being able to execute the command as SQL vs command line. IMO this doesn't improve much usability 🙃.

I think there are actually a few differences here:

IANTHEREAL commented 5 years ago

Regarding the embedded problem, the easy way to use it has great benefits for the product. In addition, lightning may support online import later, so we need to carefully consider the hybrid deployment; another idea, if lightning can be A special form of tidb exists (only for import services), maybe it can solve this problem

howerver it is not able to achieve it immediately, we can first focus on optimizing the configuration