Closed cloudfly closed 6 years ago
@cloudfly What's the output of
show retention policies on "com.client"
$ ls -R /data/influxdb/data/com.client
and
show retention policies on "_internal"
$ ls -R /data/influxdb/data/_internal
The retention policy:
> show retention policies on "com.client"
name duration shardGroupDuration replicaN default
---- -------- ------------------ -------- -------
autogen 2160h0m0s 168h0m0s 1 true
> show retention policies on "_internal"
name duration shardGroupDuration replicaN default
---- -------- ------------------ -------- -------
monitor 168h0m0s 24h0m0s 1 true
>
ls -R
result of _internal
:
# ls -R /data/influxdb/data/_internal
/data/influxdb/data/_internal:
monitor
/data/influxdb/data/_internal/monitor:
418 419 423 437 439 576 582 586 593 597 604 620 625
/data/influxdb/data/_internal/monitor/418:
000000010-000000003.tsm
/data/influxdb/data/_internal/monitor/419:
000000012-000000004.tsm
/data/influxdb/data/_internal/monitor/423:
000000016-000000004.tsm
/data/influxdb/data/_internal/monitor/437:
000000017-000000002.tsm
/data/influxdb/data/_internal/monitor/439:
000007783-000000005.tsm 000007811-000000005.tsm 000007815-000000003.tsm 000007816-000000001.tsm
/data/influxdb/data/_internal/monitor/576:
000000032-000000005.tsm
/data/influxdb/data/_internal/monitor/582:
000000032-000000005.tsm
/data/influxdb/data/_internal/monitor/586:
000000033-000000002.tsm
/data/influxdb/data/_internal/monitor/593:
000000033-000000002.tsm
/data/influxdb/data/_internal/monitor/597:
000000034-000000003.tsm
/data/influxdb/data/_internal/monitor/604:
000000036-000000004.tsm
/data/influxdb/data/_internal/monitor/620:
000000039-000000002.tsm
/data/influxdb/data/_internal/monitor/625:
000000034-000000005.tsm 000000036-000000002.tsm
ls -R
result of com.client
is:
# ls -R /data/influxdb/data/com.client
/data/influxdb/data/com.client:
autogen
/data/influxdb/data/com.client/autogen:
276 324 352 377 382 397 399 407 418 432 439 450 467 475 484 490 505 530 545 566 572 574 580 592 602 619
295 329 368 378 396 398 400 416 419 434 440 464 469 483 485 491 525 531 565 571 573 575 591 601 603 628
/data/influxdb/data/com.client/autogen/276:
000007538-000000004.tsm
/data/influxdb/data/com.client/autogen/295:
000007493-000000005.tsm
/data/influxdb/data/com.client/autogen/324:
000007865-000000004.tsm
/data/influxdb/data/com.client/autogen/329:
000007738-000000005.tsm
/data/influxdb/data/com.client/autogen/352:
000000003-000000003.tsm
/data/influxdb/data/com.client/autogen/368:
000007607-000000003.tsm
/data/influxdb/data/com.client/autogen/377:
000007709-000000002.tsm
/data/influxdb/data/com.client/autogen/378:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/382:
000007908-000000002.tsm
/data/influxdb/data/com.client/autogen/396:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/397:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/398:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/399:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/400:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/407:
000000002-000000003.tsm
/data/influxdb/data/com.client/autogen/416:
000008050-000000004.tsm 000008055-000000001.tsm
/data/influxdb/data/com.client/autogen/418:
000000002-000000002.tsm
/data/influxdb/data/com.client/autogen/419:
000000007-000000002.tsm
/data/influxdb/data/com.client/autogen/432:
000000002-000000003.tsm
/data/influxdb/data/com.client/autogen/434:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/439:
000000001-000000001.tsm
/data/influxdb/data/com.client/autogen/440:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/450:
000000002-000000003.tsm
/data/influxdb/data/com.client/autogen/464:
000008079-000000004.tsm
/data/influxdb/data/com.client/autogen/467:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/469:
000000005-000000003.tsm
/data/influxdb/data/com.client/autogen/475:
000000033-000000003.tsm
/data/influxdb/data/com.client/autogen/483:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/484:
000007727-000000003.tsm
/data/influxdb/data/com.client/autogen/485:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/490:
000000002-000000003.tsm
/data/influxdb/data/com.client/autogen/491:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/505:
000000002-000000003.tsm
/data/influxdb/data/com.client/autogen/525:
000000045-000000003.tsm
/data/influxdb/data/com.client/autogen/530:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/531:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/545:
000000014-000000003.tsm
/data/influxdb/data/com.client/autogen/565:
000000214-000000006.tsm 000000266-000000005.tsm 000000273-000000003.tsm 000000279-000000003.tsm 000000280-000000001.tsm
/data/influxdb/data/com.client/autogen/566:
000000004-000000003.tsm
/data/influxdb/data/com.client/autogen/571:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/572:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/573:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/574:
000000002-000000003.tsm
/data/influxdb/data/com.client/autogen/575:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/580:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/591:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/592:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/601:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/602:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/603:
000000001-000000002.tsm
/data/influxdb/data/com.client/autogen/619:
000000005-000000003.tsm
/data/influxdb/data/com.client/autogen/628:
000000001-000000001.tsm
the show stats
result is:
name: shard
tags: database=_internal, engine=tsm1, id=625, path=/data/influxdb/data/_internal/monitor/625, retentionPolicy=monitor, walPath=/data/influxdb/wal/_internal/monitor/625
diskBytes fieldsCreate seriesCreate writeBytes writePointsDropped writePointsErr writePointsOk writeReq writeReqErr writeReqOk
--------- ------------ ------------ ---------- ------------------ -------------- ------------- -------- ----------- ----------
57706200 0 1569 0 0 0 1809108 2393 0 2393
name: tsm1_engine
tags: database=_internal, engine=tsm1, id=625, path=/data/influxdb/data/_internal/monitor/625, retentionPolicy=monitor, walPath=/data/influxdb/wal/_internal/monitor/625
cacheCompactionDuration cacheCompactionErr cacheCompactions cacheCompactionsActive tsmFullCompactionDuration tsmFullCompactionErr tsmFullCompactions tsmFullCompactionsActive tsmLevel1CompactionDuration tsmLevel1CompactionErr tsmLevel1Compactions tsmLevel1CompactionsActive tsmLevel2CompactionDuration tsmLevel2CompactionErr tsmLevel2Compactions tsmLevel2CompactionsActive tsmLevel3CompactionDuration tsmLevel3CompactionErr tsmLevel3Compactions tsmLevel3CompactionsActive tsmOptimizeCompactionDuration tsmOptimizeCompactionErr tsmOptimizeCompactionstsmOptimizeCompactionsActive
----------------------- ------------------ ---------------- ---------------------- ------------------------- -------------------- ------------------ ------------------------ --------------------------- ---------------------- -------------------- -------------------------- --------------------------- ---------------------- -------------------- -------------------------- --------------------------- ---------------------- -------------------- -------------------------- ----------------------------- ------------------------ --------------------------------------------------
1681034550 0 10 0 1150810070 0 1 0276237740 0 5 0 137206553 0 20 2760412001 0 1 0 0 00 0
name: tsm1_cache
tags: database=_internal, engine=tsm1, id=625, path=/data/influxdb/data/_internal/monitor/625, retentionPolicy=monitor, walPath=/data/influxdb/wal/_internal/monitor/625
WALCompactionTimeMs cacheAgeMs cachedBytes diskBytes memBytes snapshotCount writeDropped writeErr writeOk
------------------- ---------- ----------- --------- -------- ------------- ------------ -------- -------
1676 1862469 262973120 0 20992704 0 0 0 2516
name: tsm1_filestore
tags: database=_internal, engine=tsm1, id=625, path=/data/influxdb/data/_internal/monitor/625, retentionPolicy=monitor, walPath=/data/influxdb/wal/_internal/monitor/625
diskBytes numFiles
--------- --------
12999269 2
name: tsm1_wal
tags: database=_internal, engine=tsm1, id=625, path=/data/influxdb/data/_internal/monitor/625, retentionPolicy=monitor, walPath=/data/influxdb/wal/_internal/monitor/625
currentSegmentDiskBytes oldSegmentsDiskBytes writeErr writeOk
----------------------- -------------------- -------- -------
3418036 10505511 0 2393
name: shard
tags: database=com.client, engine=tsm1, id=619, path=/data/influxdb/data/com.client/autogen/619, retentionPolicy=autogen, walPath=/data/influxdb/wal/com.client/autogen/619
diskBytes fieldsCreate seriesCreate writeBytes writePointsDropped writePointsErr writePointsOk writeReq writeReqErr writeReqOk
--------- ------------ ------------ ---------- ------------------ -------------- ------------- -------- ----------- ----------
1010 0 427 0 0 0 0 0 0 0
name: tsm1_engine
tags: database=com.client, engine=tsm1, id=619, path=/data/influxdb/data/com.client/autogen/619, retentionPolicy=autogen, walPath=/data/influxdb/wal/com.client/autogen/619
cacheCompactionDuration cacheCompactionErr cacheCompactions cacheCompactionsActive tsmFullCompactionDuration tsmFullCompactionErr tsmFullCompactions tsmFullCompactionsActive tsmLevel1CompactionDuration tsmLevel1CompactionErr tsmLevel1Compactions tsmLevel1CompactionsActive tsmLevel2CompactionDuration tsmLevel2CompactionErr tsmLevel2Compactions tsmLevel2CompactionsActive tsmLevel3CompactionDuration tsmLevel3CompactionErr tsmLevel3Compactions tsmLevel3CompactionsActive tsmOptimizeCompactionDuration tsmOptimizeCompactionErr tsmOptimizeCompactionstsmOptimizeCompactionsActive
----------------------- ------------------ ---------------- ---------------------- ------------------------- -------------------- ------------------ ------------------------ --------------------------- ---------------------- -------------------- -------------------------- --------------------------- ---------------------- -------------------- -------------------------- --------------------------- ---------------------- -------------------- -------------------------- ----------------------------- ------------------------ --------------------------------------------------
0 0 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0
name: tsm1_cache
tags: database=com.client, engine=tsm1, id=619, path=/data/influxdb/data/com.client/autogen/619, retentionPolicy=autogen, walPath=/data/influxdb/wal/com.client/autogen/619
WALCompactionTimeMs cacheAgeMs cachedBytes diskBytes memBytes snapshotCount writeDropped writeErr writeOk
------------------- ---------- ----------- --------- -------- ------------- ------------ -------- -------
22 2501490 0 0 0 0 0 0 0
name: tsm1_filestore
tags: database=com.client, engine=tsm1, id=619, path=/data/influxdb/data/com.client/autogen/619, retentionPolicy=autogen, walPath=/data/influxdb/wal/com.client/autogen/619
diskBytes numFiles
--------- --------
1010 1
name: tsm1_wal
tags: database=com.client, engine=tsm1, id=619, path=/data/influxdb/data/com.client/autogen/619, retentionPolicy=autogen, walPath=/data/influxdb/wal/com.client/autogen/619
currentSegmentDiskBytes oldSegmentsDiskBytes writeErr writeOk
----------------------- -------------------- -------- -------
0 0 0 0
Thanks!
Another info, Influxdb panic, following is the head of the long panic info.
unexpected fault address 0x7f231c4599d8
fatal error: fault
unexpected fault address 0x7f231c458ee5
[signal SIGSEGV: segmentation violation code=0x1 addr=0x7f231c4599d8 pc=0x461336]
goroutine 2928297 [running]:
runtime.throw(0xa7c76c, 0x5)
/usr/local/go/src/runtime/panic.go:566 +0x95 fp=0xc434c2ad20 sp=0xc434c2ad00
runtime.sigpanic()
/usr/local/go/src/runtime/sigpanic_unix.go:27 +0x288 fp=0xc434c2ad78 sp=0xc434c2ad20
runtime.memmove(0xc423d195dc, 0x7f231c4599d8, 0x3)
/usr/local/go/src/runtime/memmove_amd64.s:147 +0x396 fp=0xc434c2ad80 sp=0xc434c2ad78
runtime.slicebytetostring(0x0, 0x7f231c4599d8, 0x3, 0x3fc, 0xc433581230, 0x18)
/usr/local/go/src/runtime/string.go:94 +0x7e fp=0xc434c2add8 sp=0xc434c2ad80
github.com/influxdata/influxdb/models.Tags.Map(0xc427164cc0, 0x4, 0x4, 0xc427164cc0)
/root/go/src/github.com/influxdata/influxdb/models/points.go:1624 +0xf8 fp=0xc434c2aeb0 sp=0xc434c2add8
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createVarRefSeriesIterator(0xc421e4c500, 0xc426bd3580, 0xc424a2e1e0, 0xc424a2e5a0, 0x57, 0xc42b872c30, 0x0, 0x0, 0xc426bd3700, 0x0, ...)
/root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1413 +0xc1 fp=0xc434c2bbf0 sp=0xc434c2aeb0
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createTagSetGroupIterators(0xc421e4c500, 0xc426bd3580, 0xc424a2e1e0, 0xc426bd35b0, 0x1, 0x1, 0xc42b872c30, 0xc426bd35d0, 0x1, 0x1, ...)
/root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1400 +0x20d fp=0xc434c2bdd8 sp=0xc434c2bbf0
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createTagSetIterators.func1(0xc423d196e0, 0xc42745def0, 0x2, 0x2, 0xc421e4c500, 0xc426bd3580, 0xc424a2e1e0, 0xc42b872c30, 0xc422496380, 0x1)
/root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1355 +0x12d fp=0xc434c2bf50 sp=0xc434c2bdd8
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc434c2bf58 sp=0xc434c2bf50
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createTagSetIterators
/root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1356 +0x3c3
goroutine 1 [chan receive, 625 minutes]:
main.(*Main).Run(0xc4205a8f10, 0xc42000c100, 0x4, 0x4, 0x4073db, 0xc420092058)
/root/go/src/github.com/influxdata/influxdb/cmd/influxd/main.go:94 +0x4ee
main.main()
/root/go/src/github.com/influxdata/influxdb/cmd/influxd/main.go:45 +0x1be
goroutine 17 [syscall, 625 minutes, locked to thread]:
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2086 +0x1
goroutine 5 [syscall, 625 minutes]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:116 +0x157
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.1
/usr/local/go/src/os/signal/signal_unix.go:28 +0x41
goroutine 261693 [sleep]:
time.Sleep(0x3b9aca00)
/usr/local/go/src/runtime/time.go:59 +0xe1
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactTSMLevel(0xc42042e200, 0xb74600, 0x3, 0xc42b8efc80)
/root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:979 +0x90
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).enableLevelCompactions.func4(0xc42042e200, 0xc42b8efc80)
/root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:217 +0x73
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).enableLevelCompactions
/root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:217 +0x197
goroutine 7 [IO wait]:
net.runtime_pollWait(0x7f23a4acc500, 0x72, 0x0)
/usr/local/go/src/runtime/netpoll.go:160 +0x59
net.(*pollDesc).wait(0xc420279870, 0x72, 0xc426f64dd8, 0xc420012098)
/usr/local/go/src/net/fd_poll_runtime.go:73 +0x38
net.(*pollDesc).waitRead(0xc420279870, 0xddd400, 0xc420012098)
/usr/local/go/src/net/fd_poll_runtime.go:78 +0x34
net.(*netFD).accept(0xc420279810, 0x0, 0xddbb80, 0xc42bbc50c0)
/usr/local/go/src/net/fd_unix.go:419 +0x238
net.(*TCPListener).accept(0xc42004f138, 0x4366ce, 0xc426f64e88, 0x513839)
/usr/local/go/src/net/tcpsock_posix.go:132 +0x2e
net.(*TCPListener).Accept(0xc42004f138, 0xb73900, 0xc420295d40, 0xde8060, 0xc4286f9190)
/usr/local/go/src/net/tcpsock.go:222 +0x49
github.com/influxdata/influxdb/tcp.(*Mux).Serve(0xc420295d40, 0xde3300, 0xc42004f138, 0xc42004f138, 0x0)
/root/go/src/github.com/influxdata/influxdb/tcp/mux.go:74 +0x93
created by github.com/influxdata/influxdb/cmd/influxd/run.(*Server).Open
/root/go/src/github.com/influxdata/influxdb/cmd/influxd/run/server.go:364 +0x286
...
I see that com.client
has a non default duration, do you know what command you ran to modify the duration? Was it CREATE DATABASE ... WITH DURATION ...
or ALTER RETENTION POLICY ... DURATION ...
?
> show retention policies on "com.client"
name duration shardGroupDuration replicaN default
---- -------- ------------------ -------- -------
autogen 2160h0m0s 168h0m0s 1 true
we run ALTER RETENTION POLICY ...
a few weeks ago, set the duration to 90 days.
but this problem happened a few days ago.
@cloudfly Can you try changing it back to the original RP settings and see if the issue persists. There used to be an issue, but I thought that it had been fixed.
@cloudfly -- is this still an issue for you or shall we close this out?
I found this issue because I'm seeing the exact same problem, but with 1.4.3, 1.5.2, and 1.5.3. With a database called "samm", and the following command:
influx -username admin -password ... -database samm -execute 'INSERT test1 field1=1.0'
the "test1" measurement gets created in "_internal", with no data. I tried altering the retention policy as mentioned above, but no change. If I deliberately misspell the database name, I correctly get an error:
ERR: {"error":"database not found: \"sammm\""}
I also tried creating a second database and writing the same data to that, with a measurement name of "test2", but this measurement ended up in "_internal" just like test1.
Of possible note is that I'm deploying this using docker containers in AWS.
@rockmnew What's the output of SHOW RETENTION POLICIES ON "samm"
?
I did get it working by removing the existing database files (data/
and wal/
), but I kept the old files around to look at more. On both old and new, the output of SHOW RETENTION POLICIES ON "samm"
is:
name duration shardGroupDuration replicaN default
---- -------- ------------------ -------- -------
autogen 0s 168h0m0s 1 true
Also of note, on the new instance, "samm" has the measurements I created, plus all of the measurements from "_internal" (cq, database, http, etc.)
It's still consistent that I can write to the new instance, but not to the old one.
I'm not totally sure I understand how to reproduce this. Have you been manually modifying the files in data
and wal
?
No, I haven't touched any of those files. All I've done is renamed the directories (while influxdb was not running), to remove or replace the database.
@timhallinflux , this is not a issue for me, and now I'm using 1.5.1 version. can not reproduce this problem.
@rockmnew
All I've done is renamed the directories (while influxdb was not running), to remove or replace the database
This is likely the cause of the issue. Please do not rename of modify any files.
As @cloudfly has noted this issue is likely solved.
I think you misunderstand. The renaming of the directories was done to put influxdb in a state where no databases exist, so it would recreate them the next time it started up, while keeping the old databases for future troubleshooting. It was /var/lib/influxdb/data and /var/lib/influxdb/wal which were renamed to do this. No files within these directories were renamed or modified.
While the above process of making influxdb create new databases got it into a working state, I'm worried that future upgrades will result in a return of this problem.
Also, can you confirm that a new database with only a single measurement should list all of the measurements from _internal
when you run SHOW MEASUREMENTS
?
I have a database, can not query the latest data after write sucessfully, the measurement was created in
_internal
database. Other databases are OK.I try to reproduce this problem on my local machine, but failed. The following is my operations on production enviroment
measurement was not created in
com.client
, but created in_internal
the measurement
cpu
appear in_internal
with no point. other measurementsmonitor-service/*
should also belong tocom.client
database.the measurement
cpu
can be dropped, but when I drop one ofmonitor-service/*
, influxdb will use up all my memory(128G) in minutes(cpu usage is also every high, 3000% of 32 cores). then I have to restart it. I think the reason may bemonitor-service/*
have many points butcpu
only have one point which I just insert.The influxdb version is:
system info:
only
18M
data in this database: