influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.92k stars 3.55k forks source link

Failed to start nats streaming server #20510

Open felipecanhedo opened 3 years ago

felipecanhedo commented 3 years ago

Steps to reproduce: After a server reboot I'm getting the following error on InfluxDB startup:

2021-01-13T16:38:31.182037Z error Failed to start nats streaming server {"log_id": "0Rg9e0yl000", "error": "nats: no servers available for connection"} Error: nats: no servers available for connection

The command I'm using to start is:

/usr/local/bin/influxd --engine-path=/ifdata/engine --bolt-path=/ifdata/influxd.bolt

The error does not happen if I don't pass my persistent engine-path as a parameter

Environment info:

Config: Only non-default variables are the engine path and bolt path:

--engine-path=/ifdata/engine --bolt-path=/ifdata/influxd.bolt

Logs: 2021-01-13T16:38:31.138603Z info Starting retention policy enforcement service {"log_id": "0Rg9e0yl000", "service": "retention", "check_interval": "30m"} 2021-01-13T16:38:31.138640Z info Starting precreation service {"log_id": "0Rg9e0yl000", "service": "shard-precreation", "check_interval": "10m", "advance_period": "30m"} 2021-01-13T16:38:31.138861Z info Starting query controller {"log_id": "0Rg9e0yl000", "service": "storage-reads", "concurrency_quota": 10, "initial_memory_bytes_quota_per_query": 9223372036854775807, "memory_bytes_quota_per_query": 9223372036854775807, "max_memory_bytes": 0, "queue_size": 10} 2021-01-13T16:38:31.139609Z info Configuring InfluxQL statement executor (zeros indicate unlimited). {"log_id": "0Rg9e0yl000", "max_select_point": 0, "max_select_series": 0, "max_select_buckets": 0} 2021-01-13T16:38:31.182037Z error Failed to start nats streaming server {"log_id": "0Rg9e0yl000", "error": "nats: no servers available for connection"} Error: nats: no servers available for connection See 'influxd -h' for help

russorat commented 3 years ago

@felipecanhedo thanks for the issue. can you try the latest 2.0.4 and see if this is still an issue?

luizmendesalmeida commented 3 years ago

Hi, I'm facing the same issue.

Mar 4 18:35:28 myHOST influxd[2102]: ts=2021-03-04T18:35:28.265180Z lvl=error msg="Failed to start nats streaming server" log_id=0SgdDP~W000 error="nats: no servers available for connection" Mar 4 18:35:28 myHOST influxd[2102]: Error: nats: no servers available for connection Mar 4 18:35:28 myHOST influxd[2102]: See 'influxd -h' for help

I'm running in a RPI and the version is 2.0.4 (influxdb2-2.0.4-arm64.deb)

danxmoran commented 3 years ago

@luizmendesalmeida does the problem also only affect you if you change the --engine-path? Or does it always happen?

luizmendesalmeida commented 3 years ago

Hi @danxmoran,

I didn't change anything. It was a straightforward install.

felipecanhedo commented 3 years ago

Hi @russorat ,

In my case it was "fixed" by starting over with a fresh --engine-path. It was seemingly corrupted after a server reboot with the service running.

luizmendesalmeida commented 3 years ago

Hi,

I forgot to mention I have one 2.0.4 instance running properly in Debian 10.

danxmoran commented 3 years ago

A customer has reported that this error can happen if you hit the max-open-file limit on your system.

At minimum, we should attach a logger to the NATS server to allow for better debugging. If possible, the error message included in the log should be improved.

dgnorton commented 3 years ago

@danxmoran let's look into removing NATS.

russorat commented 3 years ago

If we implemented this: https://github.com/influxdata/influxdb/issues/15445 we could remove it

Ing-Med commented 3 years ago

A customer has reported that this error can happen if you hit the max-open-file limit on your system.

At minimum, we should attach a logger to the NATS server to allow for better debugging. If possible, the error message included in the log should be improved.

i have that same problem. Yesterday influxdb stopped working, spewed out errors regarding too many open http connections. The interwebz said this was due to the limit of open files linux allowed (1024). I increased the value for both my user and docker user and now it throws this error. Also reverting bach ulimits doesn't help, the issue with the nets streaming server persists. Welp!

fitch commented 3 years ago

I'm getting the same error. First, there was an error about too many open files. Then I increased the limit from 1024 to ulimit -n 65536 and now getting this error as does @Ing-Med.

fitch commented 3 years ago

Well, it seems that adding LimitNOFILE=65536 to the influxdb.service file did the trick.

Ing-Med commented 3 years ago

Well, it seems that adding LimitNOFILE=65536 to the influxdb.service file did the trick.

Thanks for telling us your solution! Unfortunately i am running influxdb on docker, so i wouldn't know where to add that parameter. And unfortunately this happend to me again, so if no one is able to pull that magic rabbit out of their hats, i'm switching back to v1.

edit, so i saw that the container itself has a filelimit still of 1024, though it should allow more for docker on a system level. I added

ulimits:
  nproc: 65535
  nofile:
    soft: 20000
    hard: 40000

to my docker compose influx container and it suddenly started agian.

It is beyond me how this has not been documented yet. wtf