Open morgo opened 5 years ago
Can lightning discover a tidb-host and tikv-importer host from PD? That might not work perfectly, since it won't know which one is closest. I think it could absolutely discover the pd host from TiDB though (making the configuration of PD optional).
Thanks, looks like we could find the PD address from http://tidb-ip:10080/settings
Importer isn't registered on PD though, so we can't use PD to discover Importer.
Could the mydumper data-source-dir just be moved from configuration, to argument
$1
fortidb-lightning
?
Yes
Could lightning be embedded/bundled as a TiDB plugin?
Can tikv-importer somehow become embedded too?
cc @jackysp?
A plugin is basically a *.so library, we could place the Lightning/Importer code inside the plugin, or make the plugin a front-end which controls Lightning/Importer on another machine.
Placing the code directly inside the plugin is the same as the "mixed deployment" strategy which we no longer recommend. Lightning/Importer are resource intensive programs and doing so may bring down the cluster due to using up all CPU and network bandwidth (making the cluster unresponsive).
+---------------+
| TiDB |
| +-----------+ |
| | Lightning | |
| +-----------+ | +------+
| | Importer +----+ TiKV |
| +-----------+ | +------+
+---------------+
Placing the code outside the plugin means the user still needs to deploy the two programs. The only difference is being able to execute the command as SQL vs command line. IMO this doesn't improve much usability 🙃.
+--------------+
| TiDB |
| +----------+ | +-----------+
| | (plugin) +---+ Lightning |
| +----------+ | +-----+-----+
+--------------+ |
+-----+-----+ +------+
| Importer +--+ TiKV |
+-----------+ +------+
Importer isn't registered on PD though, so we can't use PD to discover Importer.
.. but it could be? :-) So start tikv-importer with the address of a pd server. This is a similar request to https://github.com/pingcap/tidb/issues/6435
Placing the code directly inside the plugin is the same as the "mixed deployment" strategy which we no matter recommend. Lightning/Importer are resource intensive programs and doing so may bring down the cluster due to using up all CPU and network bandwidth (making the cluster unresponsive).
This is true in the case of a multi-tenant TiDB cluster, but in the common case of lightning, I think I would not be using the cluster until after the data has been restored. So resource saturation is not a problem.
I think make lightning as a plugin of TiDB is a good idea. But if tikv-importer is also embedded, we may meet some CGO issues?
@morgo
.. but it could be? :-) So start tikv-importer with the address of a pd server.
This means we also need to supply the PD address to tikv-importer
which is also a configuration ;)
Placing the code outside the plugin means the user still needs to deploy the two programs. The only difference is being able to execute the command as SQL vs command line. IMO this doesn't improve much usability 🙃.
I think there are actually a few differences here:
Regarding the embedded
problem, the easy way to use it has great benefits for the product.
In addition, lightning may support online import later, so we need to carefully consider the hybrid deployment;
another idea, if lightning can be A special form of tidb exists (only for import services), maybe it can solve this problem
howerver it is not able to achieve it immediately, we can first focus on optimizing the configuration
Feature Request
Is your feature request related to a problem? Please describe:
Currently lightning requires a lot of configuration:
tikv-importer
)I would like to find a way where it can be used with minimal configuration. This helps improve convenience/support notice users and experiments.
Describe the feature you'd like:
I am open to ideas on implementation:
LIGHTNING LOAD 's3://path/to/mydumper'
(using the local tidb and learning pd from it).$1
fortidb-lightning
?Describe alternatives you've considered:
This is really only about improving convenience/usability for casual use cases, so there are many alternative implimentations. There are a lot of users that don't want to edit configuration files, but rather just have a tool setup and running with no effort.
Teachability, Documentation, Adoption, Optimization: