Open opsxcq opened 3 years ago
Same issue in my company, we were considering migrating from timescale to influx, but due to this we can't proceed with testing. Apparently timescale is much better at memory management
@danxmoran let me know if you need any additional information regarding this bug
I still having the same problem, simulated the same scenario on timescaledb with 1G of ram, it could keep up with the metrics pretty well. Looking forward to have at least the same performance with influxdb 2 on 8gb of ram.
@opsxcq would you be able to tell us anything more about your historical dataset? You mention the number of series and tags in the original issue which is great information - I'm also wondering about how the data is distributed over time, etc. If you have a sample of the data that you could provide that would be great as well. Also how are you trying to import the data - using the influx
CLI or some other way?
@angelademarco similar questions for you...from your description in #21766, I've been running some tests with an EC2 instance with 4GB of ram with influxd
running docker and am able to OOM it with certain kinds of data (particularly if the data points are spread out over a large / randomized timeframe) but not with others. I'm synthetically generating my data of course; if you have more information you could share about the data you are experiencing these crashes with that would be very useful! Sounds like you are able to simulate the the data/crashes so it'd be great if there's anything you could provide about how you are doing that I'm sure that would be helpful too!
@wbaker85 yes, it is the historical market data since 1950, on daily intervals (range goes from 100 to around 8000 symbols, it chances over time) + recent data with more granularity (hourly, minutes (1,5,15)) but only for the last few months, I stil didn't try to import all my ticker data.
I split them in two measurements, one is the tickers and the other is "daily" which contains the data for daily prices.
For reference, here's a related recent issue in the forums: https://community.influxdata.com/t/execution-of-heavy-queries-result-in-a-crash/22637/3
I have also encountered crashes on bigger queries and suppose it's a similar issue, but couldn't investigate further yet.
Just wanted to mention this is still an issue in v2.2 - I can run the same query multiple times, but it appears there is some sort of memory leak and the influx instance eventually uses all available system memory, then causes the entire server to freeze until a hard restart. Granted I'm not running in docker like OP but still hitting that out of memory error:
Sometimes the server can recover after an hour or so, sometimes it just stays frozen until I notice and restart it. (Also the gap in the memory graph is google's ops agent being killed on the server due to lack of resources which stops it reporting metrics.)
I think this is happening to every one using influxdb version 2.X, Actually it doesn't happen with the version 1.8. the memory management on influx 1.8 looks like this.
But in influx 2.3 it just die after a while because run out of memory
@riosje I can confirm that the same issue happens with the influxdb:2.6.0
docker image.
Hello! I can confirm that the same issue happens with the influxdb:2.7.6
docker image.
I have a big database with many shards.
A result of the simple command:
docker run --user=influxdb -d -p 8086:8086 --name influxdb --env-file env.list -v /home/influxdb:/var/lib/influxdb2 influxdb:2.7.6-alpine
is OOM, because the server will allocate memory for each files (WAL or indexes) without GC. The server will allocate 20 GiByte of memory, and it will stop with an error. The instance limit is 20 GiByte.
The workaround:
docker rm influxdb
# remove indexes
find /home/influxdb/engine/data/ -type d -name _series -exec rm -r {} +
find /home/influxdb/engine/data/ -type d -name index -exec rm -r {} +
# start
docker run --user=influxdb -d -p 8086:8086 --name influxdb --env-file env.list -v /home/influxdb:/var/lib/influxdb2 influxdb:2.7.6-alpine
The server will recreate indexes. The server will allocate only 7 GiByte of memory. It will work well.
The full workaround solution:
[Unit]
Description=InfluxDB Service
After=docker.service
Requires=docker.service
[Service]
TimeoutStartSec=0
Restart=always
ExecStartPre=-/usr/bin/docker stop %n
ExecStartPre=-/usr/bin/docker rm %n
ExecStartPre=/usr/bin/docker pull influxdb:2.7.6-alpine
ExecStartPre=find /home/influxdb/engine/data/ -type d -name _series -exec rm -r {} +
ExecStartPre=find /home/influxdb/engine/data/ -type d -name index -exec rm -r {} +
ExecStart=docker run --rm --user=influxdb -d -p 8086:8086 -m 16g --name %n --env-file env.list -v /home/influxdb:/var/lib/influxdb2 influxdb:2.7.6-alpine
ExecStop=/usr/bin/docker stop %n
[Install]
WantedBy=default.target
Cons:
Update. I'm thinking about a new entrypoint:
docker run --user=influxdb --restart=on-failure --restart unless-stopped --entrypoint '/bin/bash' -d -p 8086:8086 -m 12g --name influx --env-file env.list -v /home/influxdb:/var/lib/influxdb2 influxdb:2.7.6-alpine --verbose -c "find /var/lib/influxdb2/engine/data/ -type d -name _series -exec rm -r {} + && find /var/lib/influxdb2/engine/data/ -type d -name index -exec rm -r {} + && /entrypoint.sh influxd"
It includes:
find /var/lib/influxdb2/engine/data/ -type d -name _series -exec rm -r {} +
&&
find /var/lib/influxdb2/engine/data/ -type d -name index -exec rm -r {} +
&&
/entrypoint.sh influxd
this is wild, Influx is requiring way too much RAM to barely operate, and with the release of Influx V3 they will not fix this, because they need the people buy his new license. I just moved out to victoriaMetrics, the performance of that solution is really good, also the architecture is way better than influx.
offtop_mode_on
I just moved out to victoriaMetrics
My teammates would like to have a storage without strict cardinality limits. And they would like to use some complex scripts for getting metrics. They are familiar with Flux, but not with MetricQL.
I used NGinx-proxy for data replication operations: https://gist.github.com/polarnik/cb6f22751e8d1590342198609243c529
And teammates had similar data in VictoriaMetrics and in InfluxDB. This is an old solution.
We are using InfluxDB for raw data and complex Flux queries. And VictoriaMetrics for aggregates and alerts, only. VictoriaMetrics has some limits, too. We use it for clean data, only
offtop_mode_off
My current workaround is
docker run --user=influxdb --restart=on-failure --restart unless-stopped --entrypoint '/bin/bash' -d -p 8086:8086 --log-driver=syslog --name influx --env-file env.list -v /home/influxdb:/var/lib/influxdb2 influxdb:2.7.6-alpine --verbose -c "find /var/lib/influxdb2/engine/data/ -type d -name _series -exec rm -r {} + && find /var/lib/influxdb2/engine/data/ -type d -name index -exec rm -r {} + && /entrypoint.sh influxd"
The log (short version):
The cost of reindexing is: 139 seconds
For databases with size ~= 6 GiByte:
du -d 1 -b /home/influxdb/engine/
33619605 /home/influxdb/engine/wal
6299686756 /home/influxdb/engine/data
4096 /home/influxdb/engine/replicationq
6333314553 /home/influxdb/engine/
The initial memory allocation (RAM) is 8.6 GiByte.
Databases contains metrics from the sitespeed.io tests: versions, browsers, etc. There are low cardinality tags and a lot of metrics
I still have the OOM problem.
But I have logs. The root cause of the OOM problem is an operation: "TSI log compaction".
The first message was about "TSI log compaction (start)":
May 14 11:09:23 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:23.317877Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=8 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start
The operation "TSI log compaction" finished in two minutes, only:
May 14 11:11:53 influxdb f7d421060a18[742]: ts=2024-05-14T11:11:00.371732Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=8 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start
Because the OOM happen:
``` May 14 11:11:54 influxdb f7d421060a18[742]: memory allocation of 1056 bytes failed May 14 11:11:54 influxdb f7d421060a18[742]: SIGABRT: abort May 14 11:11:54 influxdb f7d421060a18[742]: PC=0x7f443269c792 m=149 sigcode=18446744073709551610 May 14 11:11:54 influxdb f7d421060a18[742]: signal arrived during cgo execution May 14 11:11:54 influxdb f7d421060a18[742]: goroutine 677421 [syscall]: May 14 11:11:54 influxdb f7d421060a18[742]: runtime.cgocall(0x7f443235d3e0, 0xc218f518a8) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/cgocall.go:157 +0x4b fp=0xc218f51880 sp=0xc218f51848 pc=0x7f44305bab8b May 14 11:11:54 influxdb f7d421060a18[742]: github.com/influxdata/flux/libflux/go/libflux._Cfunc_flux_analyze(0x7f42973f04d0, 0x7f380360bb20, 0xc0189b1310) May 14 11:11:54 influxdb f7d421060a18[742]: #011_cgo_gotypes.go:122 +0x50 fp=0xc218f518a8 sp=0xc218f51880 pc=0x7f4430b661f0 May 14 11:11:54 influxdb f7d421060a18[742]: github.com/influxdata/flux/libflux/go/libflux.AnalyzeWithOptions.func3(0xc218f51948?, 0x2?, 0x2?) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/pkg/mod/github.com/influxdata/flux@v0.194.5/libflux/go/libflux/analyze.go:142 +0x7d fp=0xc218f518f0 sp=0xc218f518a8 pc=0x7f4430b680bd May 14 11:11:54 influxdb f7d421060a18[742]: github.com/influxdata/flux/libflux/go/libflux.AnalyzeWithOptions(0xc027fecc88, {{0x0?, 0x7f44339026c0?, 0x7f44335f69a0?}}) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/pkg/mod/github.com/influxdata/flux@v0.194.5/libflux/go/libflux/analyze.go:142 +0x169 fp=0xc218f519f8 sp=0xc218f518f0 pc=0x7f4430b67d09 May 14 11:11:54 influxdb f7d421060a18[742]: github.com/influxdata/flux/runtime.AnalyzePackage({0x7f4433a0a9e8?, 0xc1b0121620?}, {0x7f4433a05c30?, 0xc027fecc88}) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/pkg/mod/github.com/influxdata/flux@v0.194.5/runtime/analyze_libflux.go:23 +0xb2 fp=0xc218f51a78 sp=0xc218f519f8 pc=0x7f4430b723f2 May 14 11:11:54 influxdb f7d421060a18[742]: github.com/influxdata/flux/runtime.(*runtime).Eval(0x7f44363628a0, {0x7f4433a0a9e8, 0xc1b0121620}, {0x7f4433a05c30?, 0xc027fecc88?}, {0x7f44339fc378, 0x7f443639ff20}, {0xc140df0580, 0x2, 0x2}) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/pkg/mod/github.com/influxdata/flux@v0.194.5/runtime/runtime.go:102 +0x85 fp=0xc218f51af8 sp=0xc218f51a78 pc=0x7f4430b742a5 May 14 11:11:54 influxdb f7d421060a18[742]: github.com/influxdata/flux/lang.(*AstProgram).getSpec(0xc1732f34a0, {0x7f4433a0a9e8, 0xc1b01215f0}, {0x7f44339e35f0?, 0x7f4433475460?}) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/pkg/mod/github.com/influxdata/flux@v0.194.5/lang/compiler.go:446 +0x2e3 fp=0xc218f51c88 sp=0xc218f51af8 pc=0x7f44314b3123 May 14 11:11:54 influxdb f7d421060a18[742]: github.com/influxdata/flux/lang.(*AstProgram).Start(0xc1732f34a0, {0x7f4433a0a9e8, 0xc1b01213e0}, {0x7f4433a0c310, 0xc16ff75f90}) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/pkg/mod/github.com/influxdata/flux@v0.194.5/lang/compiler.go:484 +0x1c9 fp=0xc218f51e98 sp=0xc218f51c88 pc=0x7f44314b3ca9 May 14 11:11:54 influxdb f7d421060a18[742]: github.com/influxdata/influxdb/v2/query/control.(*Controller).executeQuery(0xc218f51fa8?, 0xc1d4e4a1a0) May 14 11:11:54 influxdb f7d421060a18[742]: #011/root/project/query/control/controller.go:489 +0x219 fp=0xc218f51f48 sp=0xc218f51e98 pc=0x7f4432188df9 May 14 11:11:54 influxdb f7d421060a18[742]: github.com/influxdata/influxdb/v2/query/control.(*Controller).processQueryQueue(...) May 14 11:11:54 influxdb f7d421060a18[742]: #011/root/project/query/control/controller.go:447 May 14 11:11:54 influxdb f7d421060a18[742]: github.com/influxdata/influxdb/v2/query/control.New.func1() May 14 11:11:54 influxdb f7d421060a18[742]: #011/root/project/query/control/controller.go:232 +0x76 fp=0xc218f51fe0 sp=0xc218f51f48 pc=0x7f44321871b6 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.goexit() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc218f51fe8 sp=0xc218f51fe0 pc=0x7f4430625421 May 14 11:11:54 influxdb f7d421060a18[742]: created by github.com/influxdata/influxdb/v2/query/control.New in goroutine 1 May 14 11:11:54 influxdb f7d421060a18[742]: #011/root/project/query/control/controller.go:230 +0x9ec May 14 11:11:54 influxdb f7d421060a18[742]: goroutine 1 [chan receive, 120 minutes]: May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gopark(0x4b939adcbb1a5?, 0x40000000?, 0x0?, 0x0?, 0x8bb2c97000?) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:398 +0xce fp=0xc19a77bbc8 sp=0xc19a77bba8 pc=0x7f44305f1d0e May 14 11:11:54 influxdb f7d421060a18[742]: runtime.chanrecv(0xc000637440, 0x0, 0x1) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/chan.go:583 +0x3cd fp=0xc19a77bc40 sp=0xc19a77bbc8 pc=0x7f44305bd1ad May 14 11:11:54 influxdb f7d421060a18[742]: runtime.chanrecv1(0xc000e942a0?, 0x7f4433a0aa20?) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/chan.go:442 +0x12 fp=0xc19a77bc68 sp=0xc19a77bc40 pc=0x7f44305bcdb2 May 14 11:11:54 influxdb f7d421060a18[742]: github.com/influxdata/influxdb/v2/cmd/influxd/launcher.NewInfluxdCommand.cmdRunE.func1() May 14 11:11:54 influxdb f7d421060a18[742]: #011/root/project/cmd/influxd/launcher/cmd.go:127 +0x156 fp=0xc19a77bd00 sp=0xc19a77bc68 pc=0x7f4432256416 May 14 11:11:54 influxdb f7d421060a18[742]: github.com/influxdata/influxdb/v2/kit/cli.NewCommand.func1(0xc000c58900?, {0x7f443639ff20?, 0x4?, 0x7f44326a9e5b?}) May 14 11:11:54 influxdb f7d421060a18[742]: #011/root/project/kit/cli/viper.go:54 +0x16 fp=0xc19a77bd10 sp=0xc19a77bd00 pc=0x7f4431059a96 May 14 11:11:54 influxdb f7d421060a18[742]: github.com/spf13/cobra.(*Command).execute(0xc000c65b80, {0xc00011e0b0, 0x0, 0x0}) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842 +0x694 fp=0xc19a77bdf8 sp=0xc19a77bd10 pc=0x7f4430fdc654 May 14 11:11:54 influxdb f7d421060a18[742]: github.com/spf13/cobra.(*Command).ExecuteC(0xc000c65b80) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950 +0x389 fp=0xc19a77beb0 sp=0xc19a77bdf8 pc=0x7f4430fdcc09 May 14 11:11:54 influxdb f7d421060a18[742]: github.com/spf13/cobra.(*Command).Execute(...) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887 May 14 11:11:54 influxdb f7d421060a18[742]: main.main() May 14 11:11:54 influxdb f7d421060a18[742]: #011/root/project/cmd/influxd/main.go:61 +0x50a fp=0xc19a77bf40 sp=0xc19a77beb0 pc=0x7f443228c84a May 14 11:11:54 influxdb f7d421060a18[742]: runtime.main() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:267 +0x2d2 fp=0xc19a77bfe0 sp=0xc19a77bf40 pc=0x7f44305f1892 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.goexit() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc19a77bfe8 sp=0xc19a77bfe0 pc=0x7f4430625421 May 14 11:11:54 influxdb f7d421060a18[742]: goroutine 2 [force gc (idle), 122 minutes]: May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:398 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x7f44305f1d0e May 14 11:11:54 influxdb f7d421060a18[742]: runtime.goparkunlock(...) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:404 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.forcegchelper() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:322 +0xb8 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x7f44305f1b78 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.goexit() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x7f4430625421 May 14 11:11:54 influxdb f7d421060a18[742]: created by runtime.init.6 in goroutine 1 May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:310 +0x1a May 14 11:11:54 influxdb f7d421060a18[742]: goroutine 3 [runnable]: May 14 11:11:54 influxdb f7d421060a18[742]: runtime.goschedIfBusy() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:361 +0x28 fp=0xc000085778 sp=0xc000085760 pc=0x7f44305f1c28 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.bgsweep(0x0?) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgcsweep.go:305 +0x151 fp=0xc0000857c8 sp=0xc000085778 pc=0x7f44305dbdd1 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gcenable.func1() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:200 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x7f44305d0e65 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.goexit() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x7f4430625421 May 14 11:11:54 influxdb f7d421060a18[742]: created by runtime.gcenable in goroutine 1 May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:200 +0x66 May 14 11:11:54 influxdb f7d421060a18[742]: goroutine 4 [GC scavenge wait]: May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gopark(0x975898e?, 0x8286c0?, 0x0?, 0x0?, 0x0?) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:398 +0xce fp=0xc000085f70 sp=0xc000085f50 pc=0x7f44305f1d0e May 14 11:11:54 influxdb f7d421060a18[742]: runtime.goparkunlock(...) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:404 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.(*scavengerState).park(0x7f4436368800) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa0 sp=0xc000085f70 pc=0x7f44305d95c9 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.bgscavenge(0x0?) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa0 pc=0x7f44305d9b79 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gcenable.func2() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:201 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x7f44305d0e05 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.goexit() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x7f4430625421 May 14 11:11:54 influxdb f7d421060a18[742]: created by runtime.gcenable in goroutine 1 May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:201 +0xa5 May 14 11:11:54 influxdb f7d421060a18[742]: goroutine 18 [finalizer wait]: May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gopark(0x0?, 0x7f44339dd3d0?, 0x40?, 0xf?, 0x1000000010?) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:398 +0xce fp=0xc000084620 sp=0xc000084600 pc=0x7f44305f1d0e May 14 11:11:54 influxdb f7d421060a18[742]: runtime.runfinq() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mfinal.go:193 +0x107 fp=0xc0000847e0 sp=0xc000084620 pc=0x7f44305cfe87 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.goexit() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x7f4430625421 May 14 11:11:54 influxdb f7d421060a18[742]: created by runtime.createfing in goroutine 1 May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mfinal.go:163 +0x3d May 14 11:11:54 influxdb f7d421060a18[742]: goroutine 19 [GC worker (idle), 2 minutes]: May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gopark(0x7f44363a2c40?, 0x3?, 0xcc?, 0x31?, 0x0?) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:398 +0xce fp=0xc000080750 sp=0xc000080730 pc=0x7f44305f1d0e May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gcBgMarkWorker() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc0000807e0 sp=0xc000080750 pc=0x7f44305d2a25 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.goexit() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x7f4430625421 May 14 11:11:54 influxdb f7d421060a18[742]: created by runtime.gcBgMarkStartWorkers in goroutine 1 May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:1219 +0x1c May 14 11:11:54 influxdb f7d421060a18[742]: goroutine 34 [GC worker (idle), 2 minutes]: May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gopark(0x7f44363a2c40?, 0x3?, 0x2a?, 0xf?, 0x0?) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:398 +0xce fp=0xc000486750 sp=0xc000486730 pc=0x7f44305f1d0e May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gcBgMarkWorker() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc0004867e0 sp=0xc000486750 pc=0x7f44305d2a25 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.goexit() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0004867e8 sp=0xc0004867e0 pc=0x7f4430625421 May 14 11:11:54 influxdb f7d421060a18[742]: created by runtime.gcBgMarkStartWorkers in goroutine 1 May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:1219 +0x1c May 14 11:11:54 influxdb f7d421060a18[742]: goroutine 5 [GC worker (idle)]: May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gopark(0x22c370d133d90?, 0x1?, 0x7?, 0x79?, 0x0?) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:398 +0xce fp=0xc000086750 sp=0xc000086730 pc=0x7f44305f1d0e May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gcBgMarkWorker() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc0000867e0 sp=0xc000086750 pc=0x7f44305d2a25 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.goexit() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x7f4430625421 May 14 11:11:54 influxdb f7d421060a18[742]: created by runtime.gcBgMarkStartWorkers in goroutine 1 May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:1219 +0x1c May 14 11:11:54 influxdb f7d421060a18[742]: goroutine 20 [GC worker (idle), 2 minutes]: May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gopark(0x22c1e446c85de?, 0x3?, 0x84?, 0xcb?, 0x0?) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:398 +0xce fp=0xc000080f50 sp=0xc000080f30 pc=0x7f44305f1d0e May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gcBgMarkWorker() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc000080fe0 sp=0xc000080f50 pc=0x7f44305d2a25 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.goexit() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x7f4430625421 May 14 11:11:54 influxdb f7d421060a18[742]: created by runtime.gcBgMarkStartWorkers in goroutine 1 May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:1219 +0x1c May 14 11:11:54 influxdb f7d421060a18[742]: goroutine 35 [GC worker (idle)]: May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gopark(0x22c3de1809b1a?, 0x1?, 0x65?, 0x7b?, 0x0?) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:398 +0xce fp=0xc000486f50 sp=0xc000486f30 pc=0x7f44305f1d0e May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gcBgMarkWorker() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc000486fe0 sp=0xc000486f50 pc=0x7f44305d2a25 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.goexit() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000486fe8 sp=0xc000486fe0 pc=0x7f4430625421 May 14 11:11:54 influxdb f7d421060a18[742]: created by runtime.gcBgMarkStartWorkers in goroutine 1 May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:1219 +0x1c May 14 11:11:54 influxdb f7d421060a18[742]: goroutine 6 [GC worker (idle)]: May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gopark(0x7f44339d89b0?, 0xc0001540a0?, 0x1a?, 0x14?, 0x0?) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:398 +0xce fp=0xc000086f50 sp=0xc000086f30 pc=0x7f44305f1d0e May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gcBgMarkWorker() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc000086fe0 sp=0xc000086f50 pc=0x7f44305d2a25 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.goexit() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x7f4430625421 May 14 11:11:54 influxdb f7d421060a18[742]: created by runtime.gcBgMarkStartWorkers in goroutine 1 May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:1219 +0x1c May 14 11:11:54 influxdb f7d421060a18[742]: goroutine 21 [GC worker (idle)]: May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gopark(0x22c3de1807bf7?, 0x3?, 0x2c?, 0xa1?, 0x0?) May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/proc.go:398 +0xce fp=0xc000081750 sp=0xc000081730 pc=0x7f44305f1d0e May 14 11:11:54 influxdb f7d421060a18[742]: runtime.gcBgMarkWorker() May 14 11:11:54 influxdb f7d421060a18[742]: #011/go/src/runtime/mgc.go:1295 +0xe5 fp=0xc0000817e0 sp=0xc000081750 pc=0x7f44305d2a25 May 14 11:11:54 influxdb f7d421060a18[742]: runtime.goexit() ```
``` "Env": [ "INFLUXD_REPORTING_DISABLED=true", "INFLUXD_STORAGE_CACHE_SNAPSHOT_WRITE_COLD_DURATION=10m0s", "INFLUXD_STORAGE_COMPACT_FULL_WRITE_COLD_DURATION=1h0m0s", "INFLUXD_STORAGE_COMPACT_THROUGHPUT_BURST=80388608", "INFLUXD_STORAGE_MAX_CONCURRENT_COMPACTIONS=2", "INFLUXD_STORAGE_SERIES_FILE_MAX_CONCURRENT_SNAPSHOT_COMPACTIONS=2", "INFLUXDB_DATA_INDEX_VERSION=\"tsi1\"", "INFLUXDB_DATA_CACHE_SNAPSHOT_MEMORY_SIZE=\"200m\"", "INFLUXDB_DATA_MAX_INDEX_LOG_FILE_SIZE=10485760", "INFLUXDB_DATA_SERIES_ID_SET_CACHE_SIZE=100", "INFLUXD_QUERY_MEMORY_BYTES=304857600", "INFLUXD_QUERY_INITIAL_MEMORY_BYTES=10485760", "INFLUXD_QUERY_CONCURRENCY=5", "INFLUXD_STORAGE_CACHE_MAX_MEMORY_SIZE=1073741824", "INFLUXD_STORAGE_CACHE_SNAPSHOT_MEMORY_SIZE=262144000", "INFLUXD_QUERY_QUEUE_SIZE=100", "INFLUXD_FLUX_LOG_ENABLED=true", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "INFLUXDB_VERSION=2.7.6", "INFLUX_CLI_VERSION=2.7.3", "INFLUX_CONFIGS_PATH=/etc/influxdb2/influx-configs", "INFLUXD_INIT_PORT=9999", "INFLUXD_INIT_PING_ATTEMPTS=600", "DOCKER_INFLUXDB_INIT_CLI_CONFIG_NAME=default" ], ```
I used some limits (2) for compactions settings:
> Duration at which the storage engine will compact all TSM files in a shard if it hasn’t received writes or deletes. `export INFLUXD_STORAGE_COMPACT_FULL_WRITE_COLD_DURATION=4h0m0s` my `export INFLUXD_STORAGE_COMPACT_FULL_WRITE_COLD_DURATION=1h0m0s` > Rate limit (in bytes per second) that TSM compactions can write to disk. `export INFLUXD_STORAGE_COMPACT_THROUGHPUT_BURST=50331648` my `export INFLUXD_STORAGE_COMPACT_THROUGHPUT_BURST=80388608` > Maximum number of full and level compactions that can run concurrently. A value of 0 results in 50% of runtime.GOMAXPROCS(0) used at runtime. Any number greater than zero limits compactions to that value. This setting does not apply to cache snapshotting. `export INFLUXD_STORAGE_MAX_CONCURRENT_COMPACTIONS=0` my `export INFLUXD_STORAGE_MAX_CONCURRENT_COMPACTIONS=2` > Maximum number of snapshot compactions that can run concurrently across all series partitions in a database. `export INFLUXD_STORAGE_SERIES_FILE_MAX_CONCURRENT_SNAPSHOT_COMPACTIONS=0` my: `export INFLUXD_STORAGE_SERIES_FILE_MAX_CONCURRENT_SNAPSHOT_COMPACTIONS=2`
I have some duplicate lines in the logs:
``` May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.257996Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=4 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.260277Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=6 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.260525Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=5 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.262446Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=7 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.264009Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=3 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.265600Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=6 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.267097Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=7 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.269065Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=1 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.260268Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=8 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.269662Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=8 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.270772Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=2 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.274354Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=4 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.283160Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=1 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.289673Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=5 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.291732Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=3 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.291732Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=2 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.298355Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=6 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start May 14 11:09:25 influxdb f7d421060a18[742]: ts=2024-05-14T11:09:25.305572Z lvl=info msg="TSI log compaction (start)" log_id=0p9aevYl000 service=storage-engine index=tsi tsi1_partition=4 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start ```
Maybe the server has some race condition problem, because 2 threads work in parallel with a same file. I'm going to try
INFLUXD_STORAGE_MAX_CONCURRENT_COMPACTIONS=1
INFLUXD_STORAGE_SERIES_FILE_MAX_CONCURRENT_SNAPSHOT_COMPACTIONS=1
I'm going to disable the compaction:
INFLUXD_STORAGE_CACHE_SNAPSHOT_WRITE_COLD_DURATION=1000d
INFLUXD_STORAGE_COMPACT_FULL_WRITE_COLD_DURATION=1000d
@polarnik - Setting env var GOMEMLIMIT
might help you. It sets a soft memory limit which will make the GC more aggressive as it nears the limit. I don't expect it to be a cure-all though. It became available in golang runtimes 1.19. Influxdb 2.7+ is built with at least go1.20.
@philjb
I will use settings with MEM Limit and GC settings, and I have disabled INFLUXD_STORAGE_COMPACT_FULL_WRITE_COLD_DURATION
:
``` GOMEMLIMIT=20GiB GOGC=10 INFLUXD_REPORTING_DISABLED=true INFLUXD_STORAGE_CACHE_SNAPSHOT_WRITE_COLD_DURATION=1000d INFLUXD_STORAGE_COMPACT_FULL_WRITE_COLD_DURATION=1000d INFLUXD_STORAGE_COMPACT_THROUGHPUT_BURST=80388608 INFLUXD_STORAGE_MAX_CONCURRENT_COMPACTIONS=1 INFLUXD_STORAGE_SERIES_FILE_MAX_CONCURRENT_SNAPSHOT_COMPACTIONS=1 INFLUXDB_DATA_INDEX_VERSION="tsi1" INFLUXDB_DATA_CACHE_SNAPSHOT_MEMORY_SIZE="200m" INFLUXDB_DATA_MAX_INDEX_LOG_FILE_SIZE=10485760 INFLUXDB_DATA_SERIES_ID_SET_CACHE_SIZE=100 INFLUXD_QUERY_MEMORY_BYTES=304857600 INFLUXD_QUERY_INITIAL_MEMORY_BYTES=10485760 INFLUXD_QUERY_CONCURRENCY=5 INFLUXD_STORAGE_CACHE_MAX_MEMORY_SIZE=1073741824 INFLUXD_STORAGE_CACHE_SNAPSHOT_MEMORY_SIZE=262144000 INFLUXD_QUERY_QUEUE_SIZE=100 INFLUXD_FLUX_LOG_ENABLED=false ```
I still use the hack with reindexing instead of compaction:
find /var/lib/influxdb2/engine/data/ -type d -name _series -exec rm -r {} + &&\
find /var/lib/influxdb2/engine/data/ -type d -name index -exec rm -r {} +
`docker run --shm-size 2g -m 25GiB --user=influxdb --restart=on-failure --restart unless-stopped --entrypoint '/bin/bash' -d -p 8086:8086 --log-driver=syslog --name influx --env-file env.list -v /home/influxdb:/var/lib/influxdb2 influxdb:2.7.6-alpine --verbose -c "find /var/lib/influxdb2/engine/data/ -type d -name _series -exec rm -r {} + && find /var/lib/influxdb2/engine/data/ -type d -name index -exec rm -r {} + && /entrypoint.sh influxd"`
It works well.
I saw the memory allocation error before meeting the memory limit:
My memory limits are: - 30 GiByte in a station - 25 GiByte in a docker container - 20 GiByte in [GOMEMLIMIT](https://pkg.go.dev/runtime) The current memory allocation is 5-6 GiByte. I did't see memory allocation ~= 20 GiByte, I had the allocation ~= 10 GiByte, only, but I had a container-restart-state.
I have too many influxdb threads == 71:
I have calculated how many files they use:
lsof > /tmp/lsof.info2.txt
cat /tmp/lsof.info2.txt | grep influx | awk '{ print $11 }' > /tmp/files.txt
cat /tmp/files.txt| sort > /tmp/files.sorted.txt
cat /tmp/files.sorted.txt | uniq -c > /tmp/counts.txt
71 threads use 73 000 file descriptors. There are 5 183 000 file descriptors in sum.
The influxdb process and all threads (green threads) could reach some limit, but it might not be the memory limit, but the virtual memory limit or the file descriptor limit.
What do you think ? Do we have some environment variable about threads number? The default value is about 70. Is it possible to reduce the threads number?
I only skimmed through your response - you can set GOMAXPROCS
to limit the number of OS threads, but I believe those processes showing in htop are golang's greenthreads - you can't limit those. See https://pkg.go.dev/runtime I don't think the number of green threads should be an issue? I'm not aware of a limit for those from linux.
Influxdb can use a lot of file descriptors - you can raise it with ulimit
(as you probably know).
My current recipe
The custom command
docker run --shm-size 2g --user=influxdb --restart=on-failure --restart unless-stopped --entrypoint '/bin/bash' -d -p 8086:8086 --log-driver=syslog --name influx --env-file env.list -v /home/influxdb:/var/lib/influxdb2 influxdb:2.7.6-alpine --verbose -c "find /var/lib/influxdb2/engine/data/ -type d -name index -exec rm -r {} + && /entrypoint.sh influxd"
and the custom config
INFLUXD_STORAGE_COMPACT_FULL_WRITE_COLD_DURATION=48h
INFLUXD_STORAGE_SERIES_ID_SET_CACHE_SIZE=0
work well. Docker container restarts every 48 hours, only. I have started it at night. It restarted at night in 48 hours. The night is a convenient time for restarts. Duration of restarts is 1-2 minutes.
The option
INFLUXD_STORAGE_COMPACT_FULL_WRITE_COLD_DURATION=1000d
didn't work well. It was equal to 3h
.
``` GOMEMLIMIT=25GiB GOGC=10 INFLUXD_REPORTING_DISABLED=true INFLUXD_STORAGE_CACHE_SNAPSHOT_WRITE_COLD_DURATION=10m INFLUXD_STORAGE_COMPACT_FULL_WRITE_COLD_DURATION=48h INFLUXD_STORAGE_COMPACT_THROUGHPUT_BURST=80388608 INFLUXD_STORAGE_MAX_CONCURRENT_COMPACTIONS=1 INFLUXD_STORAGE_SERIES_FILE_MAX_CONCURRENT_SNAPSHOT_COMPACTIONS=1 INFLUXD_QUERY_MEMORY_BYTES=304857600 INFLUXD_QUERY_INITIAL_MEMORY_BYTES=10485760 INFLUXD_QUERY_CONCURRENCY=5 INFLUXD_STORAGE_CACHE_MAX_MEMORY_SIZE=1073741824 INFLUXD_STORAGE_CACHE_SNAPSHOT_MEMORY_SIZE=26214400 INFLUXD_STORAGE_WAL_MAX_WRITE_DELAY=10m INFLUXD_STORAGE_WRITE_TIMEOUT=10s INFLUXD_STORAGE_WAL_MAX_CONCURRENT_WRITES=6 INFLUXD_STORAGE_SERIES_ID_SET_CACHE_SIZE=0 INFLUXD_QUERY_QUEUE_SIZE=100 INFLUXD_FLUX_LOG_ENABLED=false ```
docker run --shm-size 2g --user=influxdb --restart=on-failure --restart unless-stopped -d -p 8086:8086 --log-driver=syslog --name influx --env-file env.list -v /home/influxdb:/var/lib/influxdb2 influxdb:2.7.6-alpine
The simple command has a memory allocation error. The error occurs at a memory allocation point ~ of 25% (6 GiByte from 25-30 GiByte). A server and a Docker container have available RAM, but operations "Open TSI" get the allocation error.
There is a solution: https://github.com/influxdata/influxdb/issues/23246
sysctl -w vm.max_map_count=262144
I'm facing an issue with influx 2.0.7 running in a docker container with memory constrains, I'm aware of memory tradeoffs for databases, and I'm ok with it being slower due to limited resources. I tried to limit all memory/buffer parameters that I could to keep it under control, but influx keeps dying from OOM while importing big historical datasets (1.5mi data points) into 5 measurements with about 5 tags.
Bellow my deployment code (ansible)
Steps to reproduce: List the minimal actions needed to reproduce the behavior.
Expected behavior: Guidance on how to run influxdb with limited memory constrains without suffering such problems.
Actual behavior: Influx dies and get stuck into a restart loop due to OOM.
Environment info:
Linux 4.19.0-16-amd64 x86_64
InfluxDB 2.0.7 (git: 2a45f0c037) build_date: 2021-06-04T19:17:40Z
Config: Configuration is set on command line arguments on the above snippet
Logs: Logs are quite big and with a lot of very similar entries as bellow
Performance: Due to the fact that influxdb get stuck into a restart loop and don't start the http interface makes hard to run the command to get profiler information. I'm awaiting for further instructions on how to do it.