influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.58k stars 3.54k forks source link

Influxdb starts slowly, about 3 hours #19742

Open guideyong opened 3 years ago

guideyong commented 3 years ago

Today, the production environment found that the request timed out, so it tried to restart influxdb. However, influxdb always failed to start. After observing the log, it was found that nearly two hours were spent in the opened shard 309554. This is unacceptable. Please help to analyze the cause. Thank you

ts=2020-10-14T03:02:48.760541Z lvl=info msg="Opened shard" log_id=0PqGFLd0000 service=store trace_id=0PqGFM2l000 op_name=tsdb_open index_version=inmem path=/home/see/influxdb/datadir/data/see/autogen/84 duration=7225.312ms ts=2020-10-14T03:02:48.778614Z lvl=info msg="Opened shard" log_id=0PqGFLd0000 service=store trace_id=0PqGFM2l000 op_name=tsdb_open index_version=inmem path=/home/see/influxdb/datadir/data/see/autogen/228 duration=6725.338ms ts=2020-10-14T03:02:48.944111Z lvl=info msg="Opened shard" log_id=0PqGFLd0000 service=store trace_id=0PqGFM2l000 op_name=tsdb_open index_version=inmem path=/home/see/influxdb/datadir/data/see/autogen/723 duration=28685.096ms ts=2020-10-14T03:02:49.811106Z lvl=info msg="Opened shard" log_id=0PqGFLd0000 service=store trace_id=0PqGFM2l000 op_name=tsdb_open index_version=inmem path=/home/see/influxdb/datadir/data/see_bak/autogen/309497 duration=6053.201ms ts=2020-10-14T03:02:50.276503Z lvl=info msg="Opened shard" log_id=0PqGFLd0000 service=store trace_id=0PqGFM2l000 op_name=tsdb_open index_version=inmem path=/home/see/influxdb/datadir/data/see/autogen/8 duration=12965.821ms ts=2020-10-14T04:49:24.972762Z lvl=info msg="Opened shard" log_id=0PqGFLd0000 service=store trace_id=0PqGFM2l000 op_name=tsdb_open index_version=inmem path=/home/see/influxdb/datadir/data/see/autogen/309554 duration=6956055.645ms ts=2020-10-14T04:49:25.074498Z lvl=info msg="Open store (end)" log_id=0PqGFLd0000 service=store trace_id=0PqGFM2l000 op_name=tsdb_open op_event=end op_elapsed=6984263.269ms ts=2020-10-14T04:49:25.075083Z lvl=info msg="Opened service" log_id=0PqGFLd0000 service=subscriber ts=2020-10-14T04:49:25.076074Z lvl=info msg="Starting monitor service" log_id=0PqGFLd0000 service=monitor ts=2020-10-14T04:49:25.076147Z lvl=info msg="Registered diagnostics client" log_id=0PqGFLd0000 service=monitor name=build ts=2020-10-14T04:49:25.076160Z lvl=info msg="Registered diagnostics client" log_id=0PqGFLd0000 service=monitor name=runtime ts=2020-10-14T04:49:25.076800Z lvl=info msg="Registered diagnostics client" log_id=0PqGFLd0000 service=monitor name=network ts=2020-10-14T04:49:25.076821Z lvl=info msg="Registered diagnostics client" log_id=0PqGFLd0000 service=monitor name=system ts=2020-10-14T04:49:25.078125Z lvl=info msg="Starting precreation service" log_id=0PqGFLd0000 service=shard-precreation check_interval=10m advance_period=30m ts=2020-10-14T04:49:25.078143Z lvl=info msg="Starting snapshot service" log_id=0PqGFLd0000 service=snapshot ts=2020-10-14T04:49:25.082080Z lvl=info msg="Starting continuous query service" log_id=0PqGFLd0000 service=continuous_querier ts=2020-10-14T04:49:25.078212Z lvl=info msg="Storing statistics" log_id=0PqGFLd0000 service=monitor db_instance=_internal db_rp=monitor interval=10s ts=2020-10-14T04:49:25.083597Z lvl=info msg="Starting HTTP service" log_id=0PqGFLd0000 service=httpd authentication=true ts=2020-10-14T04:49:25.084534Z lvl=info msg="Auth is enabled but shared-secret is blank. BearerAuthentication is disabled." log_id=0PqGFLd0000 service=httpd ts=2020-10-14T04:49:25.087334Z lvl=info msg="Listening on HTTP" log_id=0PqGFLd0000 service=httpd addr=[::]:18086 https=false

russorat commented 3 years ago

@guideyong thanks! could you add the relevant information from the bug report template?

guideyong commented 3 years ago

Steps to reproduce: List the minimal actions needed to reproduce the behavior.

  1. Execute the influxdb start command

Expected behavior: I'd like to start infixdb quickly

Actual behavior: Startup of infixdb is slow and takes 3 hours

Environment info:

Config:

influxdb.txt

Logs: ts=2020-10-14T03:02:50.276503Z lvl=info msg="Opened shard" log_id=0PqGFLd0000 service=store trace_id=0PqGFM2l000 op_name=tsdb_open index_version=inmem path=/home/see/influxdb/datadir/data/see/autogen/8 duration=12965.821ms ts=2020-10-14T04:49:24.972762Z lvl=info msg="Opened shard" log_id=0PqGFLd0000 service=store trace_id=0PqGFM2l000 op_name=tsdb_open index_version=inmem path=/home/see/influxdb/datadir/data/see/autogen/309554 duration=6956055.645ms ts=2020-10-14T04:49:25.074498Z lvl=info msg="Open store (end)" log_id=0PqGFLd0000 service=store trace_id=0PqGFM2l000 op_name=tsdb_open op_event=end op_elapsed=6984263.269ms

like-inspur commented 3 years ago

@russorat I also met this question when I install influxdb on kubernetes with config readiness health check, when there are large mounts of data under influxdb storage, influxdb start will first open too may shard file, and httpd service port 8086 start very slowly

danxmoran commented 3 years ago

This is a problem with 2.x as well, since the storage engine is shared.

19201 is a potential fix for the slow startups. It hasn't been accepted yet because we don't have testing/benchmarking infrastructure in place to see how it'd impact write & query performance after the database has started. We're working on adding that infrastructure now, so hopefully we'll be able to tackle significant changes like this with confidence in the near future.

fenjen commented 3 years ago

I had the same problem here. Yesterday, I started the service (v2.0.3). It took more than 4h between start and socket listen. I found out the service was reading from 3 tombstone files. Their sizes was about 150M each. Maybe someone could check the proposed fix?

greatawesome commented 2 years ago

I'm experiencing a similar problem. Influx start up is spending between 30-60 seconds to open each shard, and we have hundreds that it needs to process. Start-up takes well over 4 hours.

Our data directory is approx 453 MB in total.