Closed henningWoehr closed 3 months ago
I just tested the whole upgrade process with all the data in a test VM and I get the same errors, but noticed some logs about memory allocation. When restarting and looking at the memory through htop
, I can see the memory rise to about 8 GB and then drop, because influxdb is crashed
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.901514Z lvl=error msg="Cannot read corrupt tsm file, renaming" log_id=0nebSz9l000 service=storage-engine engine=tsm1 service=filestore path=/datadrive/influxdb2/engine/data/993a17731a4724e8/autogen/1887/000000008-000000002.tsm id=0 error="cannot allocate memory"
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.901570Z lvl=error msg="Failed to open shard" log_id=0nebSz9l000 service=storage-engine service=store op_name=tsdb_open db_shard_id=1887 error="[shard 1887] cannot read corrupt file /datadrive/influxdb2/engine/data/993a17731a4724e8/autogen/1887/000000008-000000002.tsm: cannot allocate memory"
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.904493Z lvl=info msg="TSI log compaction (end)" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=6 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=end op_elapsed=134.631ms
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.904528Z lvl=info msg="TSI log compaction (end)" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=5 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=end op_elapsed=138.782ms
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.904552Z lvl=info msg="TSI log compaction (end)" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=4 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=end op_elapsed=117.670ms
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.905670Z lvl=info msg="TSI log compaction (end)" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=3 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=end op_elapsed=132.550ms
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.906022Z lvl=info msg="Log file compacted" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=7 op_name=tsi1_compact_log_file tsi1_log_file_id=1 elapsed=119ms bytes=4320 kb_per_sec=35
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.907656Z lvl=error msg="Cannot open compacted index file" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=1 op_name=tsi1_compact_log_file tsi1_log_file_id=1 error="cannot allocate memory" path=/datadrive/influxdb2/engine/data/993a17731a4724e8/autogen/1999/index/0/L1-00000001.tsi
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.907668Z lvl=info msg="TSI log compaction (end)" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=1 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=end op_elapsed=41.329ms
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.907693Z lvl=info msg="TSI log compaction (start)" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=1 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.907724Z lvl=error msg="Cannot open compacted index file" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=6 op_name=tsi1_compact_log_file tsi1_log_file_id=1 error="cannot allocate memory" path=/datadrive/influxdb2/engine/data/993a17731a4724e8/autogen/1999/index/5/L1-00000001.tsi
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.907732Z lvl=info msg="TSI log compaction (end)" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=6 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=end op_elapsed=41.310ms
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.907750Z lvl=info msg="TSI log compaction (start)" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=6 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.912702Z lvl=info msg="TSI log compaction (end)" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=7 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=end op_elapsed=126.318ms
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.912729Z lvl=info msg="TSI log compaction (end)" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=5 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=end op_elapsed=140.578ms
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.907536Z lvl=error msg="Cannot open compacted index file" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=4 op_name=tsi1_compact_log_file tsi1_log_file_id=1 error="cannot allocate memory" path=/datadrive/influxdb2/engine/data/993a17731a4724e8/autogen/16570/index/3/L1-00000001.tsi
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.912748Z lvl=info msg="TSI log compaction (end)" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=4 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=end op_elapsed=47.034ms
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.912767Z lvl=info msg="TSI log compaction (start)" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=4 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=start
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.907535Z lvl=error msg="Cannot open compacted index file" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=5 op_name=tsi1_compact_log_file tsi1_log_file_id=1 error="cannot allocate memory" path=/datadrive/influxdb2/engine/data/993a17731a4724e8/autogen/1999/index/4/L1-00000001.tsi
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.914721Z lvl=info msg="TSI log compaction (end)" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=3 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=end op_elapsed=144.798ms
Feb 29 19:53:32 biogasboard1eu influxd-systemd-start.sh[4864]: ts=2024-02-29T19:53:32.914726Z lvl=info msg="TSI log compaction (end)" log_id=0nebSz9l000 service=storage-engine index=tsi tsi1_partition=5 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=end op_elapsed=48.395ms
@henningWoehr I'm facing the same issue. I think the databases are getting corrupt on migration. I tried adding more RAM but did not work either. Were you able to solve the issue?
@weshouck32 No sadly not. I wanna try the manual upgrade, where you import each database one by one, when I have the time for that, but the upgrade is not the highest priority at the moment
I was testing the manual method and seems to be a csv of the database. When I exported a database the filesize for the export was 10X more than the database size. I'm not sure if that is a good option
The necessary clue to what is going on has been suppressed by your logging software:
Feb 29 19:48:32 biogasboard1eu influxd-systemd-start.sh[19059]: ts=2024-02-29T19:48:32.707772Z lvl=info msg="TSI log compaction (end)" log_id=0neLCLPW000 service=storage-engine index=tsi tsi1_partition=3 op_name=tsi1_compact_log_file tsi1_log_file_id=1 op_event=end op_elapsed=12298.524ms
Feb 29 19:49:02 biogasboard1eu systemd-journald[487]: Suppressed 1174405 messages from influxdb.service
Feb 29 19:49:02 biogasboard1eu influxd-systemd-start.sh[19059]: /root/project/tsdb/index/tsi1/partition.go:960 +0x119 fp=0xc215c01fc8 sp=0xc215c01f30 pc=0x7fe5fa25b059
Feb 29 19:49:02 biogasboard1eu influxd-systemd-start.sh[19059]: github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*Partition).Open.func3()
Feb 29 19:49:02 biogasboard1eu influxd-systemd-start.sh[19059]: /root/project/tsdb/index/tsi1/partition.go:254 +0x26 fp=0xc215c01fe0 sp=0xc215c01fc8 pc=0x7fe5fa2553c6
Feb 29 19:49:02 biogasboard1eu influxd-systemd-start.sh[19059]: runtime.goexit()
Have you considered using data export to compressed files and re-importation? That often works better than standard backups for large data sets.
Have you considered using data export to compressed files and re-importation? That often works better than standard backups for large data sets.
@davidby-influx That's the same as described in the manual upgrade and that's what I wanna try next
I cloned the production server again and ran the upgrade, this time it worked and the influxdb service started successfully. Still not sure why the other upgrade attempts failed.
Hello again. After I found this https://github.com/influxdata/influxdb/issues/10939 issue, I noticed that we had the same problem, because we had about 90 databases which contained about I think 150 shards or so. Also the shards where very small, only a few mbs. After grouping the databases into much fewer, we now run influx v2 without any problems.
Hi, we are currently on our way to upgrade a production influxdb from 1.8.10 to 2.7.5. Before upgrading, I tested the whole process in local test VM, which all worked fine. The main difference to the production VM is, that I didn't use all db's, because it would take too long to copy. I then used the following commands to upgrade influx:
sudo systemctl stop influxdb
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install influxdb2
sudo nano /etc/default/influxdb2
(Changed config env to '/var/lib/influxdb/.influxdbv2/config.toml')sudo mkdir /datadrive/influxdb2
sudo chown influxdb:influxdb /datadrive/influxdb2
sudo nano /etc/systemd/system/influxd.service
(Change 'LimitNOFILE' to higher number)sudo -u influxdb influxd upgrade -e /datadrive/influxdb2/engine -m /datadrive/influxdb2/influxd.bolt
sudo systemctl start influxdb
After upgrading, influxdb started up and all data was working as expected. After some time, influxdb ran into a strange error with no real error message.
This is just a sample, the full log is here as a gist, with logs about unauthorized access and queries removed
Steps to reproduce: Can't think of a way to reproduce. Currently trying to test the upgrade with full data in my test VM.
Expected behaviour: Run as normal
Actual behaviour: Error described above
Environment info:
Config: The only change in config is, that the data is stored in an other path then the default