influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.65k stars 3.54k forks source link

'ERR: no data received' after 'SELECT * INTO ...' #15433

Open kkruzich opened 4 years ago

kkruzich commented 4 years ago

Steps to reproduce: List the minimal actions needed to reproduce the behavior.

As suggested here: https://docs.influxdata.com/influxdb/v1.7/administration/backup_and_restore/#restore-examples

  1. influxd backup -portable -database telegraf -host analytics-dev:8088 -start 2019-10-12T00:00:00Z -end 2019-10-12T12:00:00Z /backup/influx_backuptelegrafdate +%Y%m%d-%H%M-%S
  2. influxd restore -portable -db telegraf -newdb telegraf_bak influx_backup_telegraf_20191015-1539-53
  3. 
    > use telegraf_bak
    Using database telegraf_bak
    > SELECT * INTO telegraf..:MEASUREMENT FROM /.*/ GROUP BY *
    ERR: no data received
    > show databases
    ERR: Post http://localhost:8086/query?chunked=true&db=telegraf_bak&epoch=ns&q=show+databases: dial tcp 127.0.0.1:8086: connect: connection refused

__Expected behavior:__
I expected the data recovery to ingest to the database. 

__Actual behavior:__
The database crashes temporarily (too busy to accept connections) and the data recovery is not ingested. 

__Environment info:__

* System info: Run Linux 4.14.138-114.102.amzn2.x86_64 x86_64
* InfluxDB version: InfluxDB v1.7.8 (git: 1.7 ff383cdc0420217e3460dabe17db54f8557d95b6)
* Other relevant environment details: config out of box

__Config:__
config out of box

<!-- The following sections are only required if relevant. -->

__Logs:__
Include snippet of errors in log.

__Performance:__
Generate profiles with the following commands for bugs related to performance, locking, out of memory (OOM), etc.

```sh
# Commands should be run when the bug is actively.
# Note: This command will run for at least 30 seconds.
curl -o profiles.tar.gz "http://localhost:8086/debug/pprof/all?cpu=true"
curl -o vars.txt "http://localhost:8086/debug/vars"
iostat -xd 1 30 > iostat.txt
# Attach the `profiles.tar.gz`, `vars.txt`, and `iostat.txt` output files.
stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

russorat commented 4 years ago

@kkruzich sorry it looks like this issue fell through the cracks. Are you able to test and repro with influx 1.7.9?

emr-arvig commented 4 years ago

@kkruzich

I have this error on X-Influxdb-Version: 1.7.10

I have tried to run these two commands:

> SELECT * INTO cm_phy_backup..:MEASUREMENT FROM /.*/ GROUP BY *
ERR: no data received

> SELECT * INTO "cm_phy_backup"."autogen".:MEASUREMENT FROM "cm_data_cm_phy"."autogen"./.*/ GROUP BY * 
ERR: no data received
russorat commented 4 years ago

@kkruzich thanks for confirming. We will take a look.

emr-arvig commented 4 years ago

Seems like I get corresponding memory error when this happens in /var/log/syslog

Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]: fatal error: runtime: out of memory
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]: runtime stack:
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]: runtime.throw(0x153b878, 0x16)
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]:  /usr/local/go/src/runtime/panic.go:617 +0x72
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]: runtime.sysMap(0xc4a0000000, 0x4000000, 0x3168998)
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]:  /usr/local/go/src/runtime/mem_linux.go:170 +0xc7
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]: runtime.(*mheap).sysAlloc(0x314fa40, 0xe000, 0x314fa50, 0x7)
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]:  /usr/local/go/src/runtime/malloc.go:633 +0x1cd
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]: runtime.(*mheap).grow(0x314fa40, 0x7, 0x0)
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]:  /usr/local/go/src/runtime/mheap.go:1222 +0x42
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]: runtime.(*mheap).allocSpanLocked(0x314fa40, 0x7, 0x31689a8, 0xc001123701)
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]:  /usr/local/go/src/runtime/mheap.go:1150 +0x37f
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]: runtime.(*mheap).alloc_m(0x314fa40, 0x7, 0x3150067, 0xc000b1a401)
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]:  /usr/local/go/src/runtime/mheap.go:977 +0xc2
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]: runtime.(*mheap).alloc.func1()
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]:  /usr/local/go/src/runtime/mheap.go:1048 +0x4c
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]: runtime.systemstack(0x7efe9a1e5b20)
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]:  /usr/local/go/src/runtime/asm_amd64.s:351 +0x66
Mar  2 13:40:49 OSS-SERVICES-03 influxd[3079]: runtime.mstart()
ipaqmaster commented 4 years ago

Late to the party but I also had this problem tonight. On a VM with 8GB of memory I tried to select 10 million values of a measurement into a new database from an imported database backup for migration purposes.

I watched in htop as the machine visibly ran out of memory and "ERR: no data received" appeared in the influx terminal on CentOS 8.

After bringing the virtual machine back online with 16G memory instead of 8 was able to import my data and delete the backup database just fine.

matthewdowney commented 3 years ago

I'm trying the same with, with a large (~1TB) backup and Influx 1.8, and I've given the machine 72G of RAM and still am not having any luck.

Does anybody know if there's a way to instruct influx to uh, take it easy, and treat the process of side loading the backup as if it were a series of normal writes, even if that means that it takes a bit longer? Any help is greatly appreciated!