influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.56k stars 3.53k forks source link

Every 2-3 days "Aborted compaction" #16136

Open somera opened 4 years ago

somera commented 4 years ago

Steps to reproduce: I don't have any idea how to reproduce it.

Actual behavior: My InfluxDB is running on Raspberry Pi. 2-10 Raspis are writing data into it. After 2-4 days I have this problem:

Aborted compaction

I switched from TSM to TSI. But the problem still exist. And now I lost new data (whole week).

Why I have so many problems? How can I solve it? I can't run SQL DB on the Raspberry Pi.

Environment info:

Logs: Dec 2 00:21:03 pi-server influxd[1321]: ts=2019-12-01T23:21:03.628706Z lvl=info msg=“TSM compaction (end)” log_id=0JSvpwlG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0JSx7Pdl000 op_name=tsm1_compact_group op_event=end op_elapsed=1001.551ms Dec 2 00:21:04 pi-server influxd[1321]: ts=2019-12-01T23:21:04.627120Z lvl=info msg=“TSM compaction (start)” log_id=0JSvpwlG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0JSx7XSl000 op_name=tsm1_compact_group op_event=start Dec 2 00:21:04 pi-server influxd[1321]: ts=2019-12-01T23:21:04.627250Z lvl=info msg=“Beginning compaction” log_id=0JSvpwlG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0JSx7XSl000 op_name=tsm1_compact_group tsm1_files_n=10 Dec 2 00:21:04 pi-server influxd[1321]: ts=2019-12-01T23:21:04.627310Z lvl=info msg=“Compacting file” log_id=0JSvpwlG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0JSx7XSl000 op_name=tsm1_compact_group tsm1_index=0 tsm1_file=/var/lib/influxdb/data/telegraf/autogen/93/000000032-000000003.tsm Dec 2 00:21:04 pi-server influxd[1321]: ts=2019-12-01T23:21:04.627374Z lvl=info msg=“Compacting file” log_id=0JSvpwlG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0JSx7XSl000 op_name=tsm1_compact_group tsm1_index=1 tsm1_file=/var/lib/influxdb/data/telegraf/autogen/93/000000064-000000003.tsm Dec 2 00:21:04 pi-server influxd[1321]: ts=2019-12-01T23:21:04.627436Z lvl=info msg=“Compacting file” log_id=0JSvpwlG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0JSx7XSl000 op_name=tsm1_compact_group tsm1_index=2 tsm1_file=/var/lib/influxdb/data/telegraf/autogen/93/000000072-000000002.tsm Dec 2 00:21:04 pi-server influxd[1321]: ts=2019-12-01T23:21:04.627497Z lvl=info msg=“Compacting file” log_id=0JSvpwlG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0JSx7XSl000 op_name=tsm1_compact_group tsm1_index=3 tsm1_file=/var/lib/influxdb/data/telegraf/autogen/93/000000081-000000002.tsm Dec 2 00:21:04 pi-server influxd[1321]: ts=2019-12-01T23:21:04.627559Z lvl=info msg=“Compacting file” log_id=0JSvpwlG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0JSx7XSl000 op_name=tsm1_compact_group tsm1_index=4 tsm1_file=/var/lib/influxdb/data/telegraf/autogen/93/000000089-000000002.tsm Dec 2 00:21:04 pi-server influxd[1321]: ts=2019-12-01T23:21:04.627620Z lvl=info msg=“Compacting file” log_id=0JSvpwlG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0JSx7XSl000 op_name=tsm1_compact_group tsm1_index=5 tsm1_file=/var/lib/influxdb/data/telegraf/autogen/93/000000090-000000001.tsm Dec 2 00:21:04 pi-server influxd[1321]: ts=2019-12-01T23:21:04.627681Z lvl=info msg=“Compacting file” log_id=0JSvpwlG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0JSx7XSl000 op_name=tsm1_compact_group tsm1_index=6 tsm1_file=/var/lib/influxdb/data/telegraf/autogen/93/000000091-000000001.tsm Dec 2 00:21:04 pi-server influxd[1321]: ts=2019-12-01T23:21:04.627742Z lvl=info msg=“Compacting file” log_id=0JSvpwlG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0JSx7XSl000 op_name=tsm1_compact_group tsm1_index=7 tsm1_file=/var/lib/influxdb/data/telegraf/autogen/93/000000092-000000001.tsm Dec 2 00:21:04 pi-server influxd[1321]: ts=2019-12-01T23:21:04.627803Z lvl=info msg=“Compacting file” log_id=0JSvpwlG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0JSx7XSl000 op_name=tsm1_compact_group tsm1_index=8 tsm1_file=/var/lib/influxdb/data/telegraf/autogen/93/000000093-000000001.tsm Dec 2 00:21:04 pi-server influxd[1321]: ts=2019-12-01T23:21:04.627864Z lvl=info msg=“Compacting file” log_id=0JSvpwlG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0JSx7XSl000 op_name=tsm1_compact_group tsm1_index=9 tsm1_file=/var/lib/influxdb/data/telegraf/autogen/93/000000094-000000001.tsm Dec 2 00:21:04 pi-server influxd[1321]: ts=2019-12-01T23:21:04.628324Z lvl=info msg=“Aborted compaction” log_id=0JSvpwlG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0JSx7XSl000 op_name=tsm1_compact_group error=“compaction in progress: open /var/lib/influxdb/data/telegraf/autogen/93/000000094-000000002.tsm.tmp: file exists”

somera commented 4 years ago

This is the current status:

root@pi-server:/var/log# ls -al /var/lib/influxdb/data/telegraf/autogen/102/ insgesamt 280460 drwxr-xr-x 3 influxdb influxdb 4096 Dez 5 11:41 . drwx------ 19 influxdb influxdb 4096 Dez 2 01:00 .. -rw-r--r-- 1 influxdb influxdb 27148801 Nov 27 01:03 000000032-000000003.tsm -rw-r--r-- 1 influxdb influxdb 24628384 Dez 1 23:40 000000064-000000003.tsm -rw-r--r-- 1 influxdb influxdb 27987875 Dez 1 23:59 000000096-000000003.tsm -rw-r--r-- 1 influxdb influxdb 7086476 Dez 1 22:51 000000104-000000002.tsm -rw-r--r-- 1 influxdb influxdb 9520037 Dez 3 20:11 000000116-000000002.tsm -rw-r--r-- 1 influxdb influxdb 972046 Dez 3 20:21 000000118-000000001.tsm -rw-r--r-- 1 influxdb influxdb 94412901 Dez 5 07:31 000000118-000000002.tsm.tmp -rw-r--r-- 1 influxdb influxdb 972046 Dez 5 07:40 000000120-000000001.tsm -rw-r--r-- 1 influxdb influxdb 94412901 Dez 5 11:41 000000120-000000002.tsm.tmp -rw-r--r-- 1 influxdb influxdb 7972 Dez 1 23:58 fields.idx drwxr-xr-x 10 influxdb influxdb 4096 Dez 2 01:05 index

aemondis commented 4 years ago

@somera it will never work properly on a RPi 3, as the compaction will fail due to out of memory errors. Many of us here have experienced the same issue, and the only solution is either getting a RPi with more memory (I use a RPi 4 with 4GB RAM) and enabling the experimental 64-bit Raspbian kernel (or moving to a bigger device such as an Intel NUC), or switching entirely to a different database platform.

The InfluxDB team has previously indicated there is no intention on supporting the platform on a RPi or adding 32-bit support, which RPi uses by default.

The good news is though, with the 64-bit kernel on my RPi4, InfluxDB has been running for about 4-5 months now with a 10GB database - whereas on the RPi3 with 1GB of RAM and 32-bit kernel it would fail multiple times per day.