vitalif / vitastor

Simplified distributed block and file storage with strong consistency, like in Ceph (repository mirror)
https://vitastor.io
Other
140 stars 22 forks source link

object(s) degraded #51

Closed rayjanoka closed 1 year ago

rayjanoka commented 1 year ago

Hey, I have a clean v0.8.6 setup with 5 nodes, but having some issues.

Running this ec pool:

{
  "2": {
    "name": "ecpool",
    "scheme": "ec",
    "pg_size": 4,
    "parity_chunks": 2,
    "pg_minsize": 2,
    "pg_count": 256,
    "failure_domain": "host"
  }
}

I have some images setup

# vitastor-cli ls
NAME                                      POOL    SIZE  FLAGS  PARENT
testimg                                   ecpool  10 G      -
testimg2                                  ecpool  10 G      -
pvc-ab034644-084d-44db-8c7f-91ad0c20da9a  ecpool  10 G      -
pvc-7e2921cc-9c4b-4120-89d2-12355cc4ce22  ecpool  10 G      -

I was running fio in a container on one of the pvc images when things stopped working. I didn't have any of the servers crash still have 5 up, but now it won't recover.

Separate OSDs have this error

[OSD 2] 20 object(s) degraded
[OSD 2] 20 object(s) degraded

[OSD 3] 3 object(s) degraded
[OSD 3] 3 object(s) degraded
root@bmhv4:/home/ubuntu# vitastor-cli status
  cluster:
    etcd: 5 / 6 up, 1.2 M database size
    mon:  5 up, master bmhv4
    osd:  5 / 5 up

  data:
    raw:   31.3 G used, 8.4 T / 8.4 T available
    state: 7.8 G clean, 2.9 M degraded
    pools: 0 / 1 active
    pgs:   196 active
           21 repeering+degraded+has_degraded+has_unclean
           37 repeering+degraded+has_unclean
           2 stopping+degraded+has_degraded+has_unclean

  io:
    client: 0 B/s rd, 0 op/s rd, 0 B/s wr, 0 op/s wr

OSD2 crash?

[PG 2/202] is offline (248 objects)
[PG 2/202] is peering (0 objects)
[PG 2/202] is peering (0 objects)
[PG 2/204] is offline (249 objects)
[PG 2/204] is peering (0 objects)
[PG 2/204] is peering (0 objects)
[PG 2/186] Got object list from OSD 2 (local): 249 object versions (248 of them stable)
[PG 2/187] Got object list from OSD 2 (local): 250 object versions (249 of them stable)
[PG 2/189] Got object list from OSD 2 (local): 250 object versions (249 of them stable)
[PG 2/191] Got object list from OSD 2 (local): 250 object versions (249 of them stable)
[PG 2/192] Got object list from OSD 2 (local): 249 object versions (249 of them stable)
[PG 2/194] Got object list from OSD 2 (local): 248 object versions (248 of them stable)
[PG 2/202] Got object list from OSD 2 (local): 248 object versions (248 of them stable)
[PG 2/204] Got object list from OSD 2 (local): 249 object versions (249 of them stable)
[PG 2/186] Got object list from OSD 3: 249 object versions (248 of them stable)
[PG 2/186] Got object list from OSD 4: 249 object versions (248 of them stable)
[PG 2/186] Got object list from OSD 1: 249 object versions (248 of them stable)
[PG 2/186] 248 clean objects on target OSD set 1, 2, 3, 4
[PG 2/186] is active + has_unclean (248 objects)
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error while doing local stabilize operation: Device or resource busy
No RDMA devices found
[OSD 2] Couldn't initialize RDMA, proceeding with TCP only
Reading blockstore metadata
[OSD 2] reporting to etcd at ["10.10.0.161:2379", "10.10.0.162:2379", "10.10.0.163:2379", "10.10.0.164:2379", "10.10.0.165:2379", "10.10.0.166:2379"] every 5 seconds
Key /vitastor/osd/state/2 already exists in etcd, OSD 2 is still up
  listening at: 10.10.0.161:37757
[OSD 2] Force stopping
No RDMA devices found
[OSD 2] Couldn't initialize RDMA, proceeding with TCP only
Reading blockstore metadata
[OSD 2] reporting to etcd at ["10.10.0.161:2379", "10.10.0.162:2379", "10.10.0.163:2379", "10.10.0.164:2379", "10.10.0.165:2379", "10.10.0.166:2379"] every 5 seconds
Successfully subscribed to etcd at 10.10.0.161:2379/v3
Metadata entries loaded: 31779, free blocks: 15592256 / 15624035
Reading blockstore journal
Journal entries loaded: 1610, free journal space: 30814208 bytes (00e73000..0110f000 is used), free blocks: 15592255 / 15624035
[PG 2/129] is starting (0 objects)
[PG 2/131] is starting (0 objects)
[PG 2/132] is starting (0 objects)

OSD3 crash?

[PG 2/9] 1 objects on OSD set 3(1), 5(2), 4(3)
[PG 2/9] is active + has_degraded (252 objects)
[PG 2/9] is active (252 objects)
[PG 2/9] is offline (252 objects)
[PG 2/9] is peering (0 objects)
[PG 2/9] is peering (0 objects)
[PG 2/9] Got object list from OSD 3 (local): 253 object versions (252 of them stable)
[PG 2/9] Got object list from OSD 4: 253 object versions (252 of them stable)
[PG 2/9] Got object list from OSD 5: 253 object versions (252 of them stable)
[OSD 3] client 10 disconnected
[OSD 3] Stopping client 10 (regular client)
[PG 2/9] Got object list from OSD 1: 253 object versions (252 of them stable)
[PG 2/9] 252 clean objects on target OSD set 1, 3, 5, 4
[PG 2/9] is active + has_unclean (252 objects)
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error while doing local stabilize operation: Device or resource busy
No RDMA devices found
[OSD 3] Couldn't initialize RDMA, proceeding with TCP only
Reading blockstore metadata
[OSD 3] reporting to etcd at ["10.10.0.161:2379", "10.10.0.162:2379", "10.10.0.163:2379", "10.10.0.164:2379", "10.10.0.165:2379", "10.10.0.166:2379"] every 5 seconds
Key /vitastor/osd/state/3 already exists in etcd, OSD 3 is still up
  listening at: 10.10.0.165:38025
[OSD 3] Force stopping
No RDMA devices found
[OSD 3] Couldn't initialize RDMA, proceeding with TCP only
Reading blockstore metadata
[OSD 3] reporting to etcd at ["10.10.0.161:2379", "10.10.0.162:2379", "10.10.0.163:2379", "10.10.0.164:2379", "10.10.0.165:2379", "10.10.0.166:2379"] every 5 seconds
Successfully subscribed to etcd at 10.10.0.165:2379/v3
Metadata entries loaded: 64134, free blocks: 15559901 / 15624035
Reading blockstore journal
Journal entries loaded: 1976, free journal space: 29929472 bytes (002fa000..0066e000 is used), free blocks: 15559901 / 15624035
[PG 2/2] is starting (0 objects)
[PG 2/3] is starting (0 objects)
[PG 2/6] is starting (0 objects)
[PG 2/8] is starting (0 objects)
[PG 2/9] is starting (0 objects)
[PG 2/10] is starting (0 objects)
[PG 2/14] is starting (0 objects)
[PG 2/19] is starting (0 objects)
[PG 2/21] is starting (0 objects)
[PG 2/22] is starting (0 objects)
[PG 2/26] is starting (0 objects)
[PG 2/31] is starting (0 objects)
[PG 2/32] is starting (0 objects)

seeing a bit of this around

Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -16 (Device or resource busy)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -32 (Broken pipe)
Error while doing flush on OSD 2: -16 (Device or resource busy)
Error while doing flush on OSD 2: -16 (Device or resource busy)
Error while doing flush on OSD 2: -16 (Device or resource busy)
Error while doing flush on OSD 2: -16 (Device or resource busy)
Error while doing flush on OSD 2: -16 (Device or resource busy)
Error while doing flush on OSD 2: -16 (Device or resource busy)
Error while doing flush on OSD 2: -16 (Device or resource busy)
Error while doing flush on OSD 2: -16 (Device or resource busy)
Error while doing flush on OSD 2: -16 (Device or resource busy)
Error while doing flush on OSD 2: -16 (Device or resource busy)
Error while doing flush on OSD 2: -16 (Device or resource busy)
Error while doing flush on OSD 2: -16 (Device or resource busy)
Error while doing flush on OSD 2: -16 (Device or resource busy)
Error while doing flush on OSD 2: -16 (Device or resource busy)
vitalif commented 1 year ago

Hi. Oops...))

It seems that I introduced this bug in e950c024d3a2d52839f79d9f54ae3d38cf223fcb while trying to fix another problem %).

what(): Error while doing local stabilize operation: Device or resource busy means that an OSD tried to "stabilize" = "2 phase second phase" (commit) an object which wasn't synced yet...

I.e. normally a EC write looks like: write -> sync -> stabilize. But in your case something probably crashed or timed out and an OSD... which probably didn't crash (!) got an unsynced write left in its RAM. Then another OSD was restarted and it requested a listing of current objects from this one. It got this unstable write in the listing and tried to "stabilize" it and got EBUSY which means "write is not synced yet" in this case.

Soooooo. The bug won't reproduce if you use SSDs with capacitors immediate_commit=all (or if you use replication instead of EC).

And for non-immediate-commit modes the proper fix seems to be to forcibly sync this object when "stabilizing" instead of returning an error. I'll take a look into it.

rayjanoka commented 1 year ago

Thanks for the detailed explanation @vitalif.

I will give it a try with EC + immediate_commit=all, we aren't worried about corruption for this particular data.

rayjanoka commented 1 year ago

I got better performance for a bit (9700 iops), but something eventually failed.

this machine locked up and restarted itself after this error

bmhv9 kernel: watchdog: BUG: soft lockup - CPU#20 stuck for 22s! [vitastor-nbd:2014263]

seems like it can't recover

ubuntu@bmhv9:~$ vitastor-cli status
  cluster:
    etcd: 5 / 6 up, 1 M database size
    mon:  5 up, master bmhv6
    osd:  5 / 5 up

  data:
    raw:   8.5 G used, 8.4 T / 8.4 T available
    state: 2.1 G clean, 1 M degraded
    pools: 0 / 1 active
    pgs:   248 active
           3 repeering+degraded+has_degraded+has_unclean
           5 stopping+degraded+has_degraded+has_unclean
vitalif commented 1 year ago

bmhv9 kernel: watchdog: BUG: soft lockup - CPU#20 stuck for 22s! [vitastor-nbd:2014263]

Which kernel do you use? Seems like a kernel bug, may be related to io_uring

vitalif commented 1 year ago

(I don't know what it is exactly but I've seen io_uring misbehaviour on various kernels several times %))

rayjanoka commented 1 year ago

ok. this is on Ubuntu 20.04 5.15.0-67-generic #74~20.04.1-Ubuntu SMP Wed Feb 22 14:52:34 UTC 2023 x86_64 I can move my lab to Ubuntu 22.04, we are planning on doing that anyway.

rayjanoka commented 1 year ago

I had to build from source using latest from liburing. Should I consider using a specific version of liburing or just update to a newer kernel?

git clone https://github.com/axboe/liburing.git
cd liburing
./configure && make && make install
vitalif commented 1 year ago

First of all check logs on that machine, maybe we can discover the cause... I mostly test everything on Debian and usually I don't get kernel errors. The last problems I got were related to RDMA and on ~5.4... Guys from the chat had some issues several months ago with hyperv and ubuntu 5.15+ kernels though. So I'm not sure... can you look for errors in other logs?

rayjanoka commented 1 year ago

hmm

Ran out of journal space (used_start=01b89000, next_free=01b67000, dirty_start=01b20000)
root@bmhv7:/home/ubuntu# tail -f /var/log/vitastor/osd2.log
[OSD 2] avg latency for subop 4 (sync): 833 us
[OSD 2] avg latency for subop 15 (ping): 269 us
Ran out of journal space (used_start=01d9c000, next_free=01d79000, dirty_start=01d46000)
[OSD 2] avg latency for op 1 (read): 17 us, B/W: 0.00 KB/s
[OSD 2] avg latency for op 3 (write_stable): 2409 us, B/W: 14.07 MB/s
[OSD 2] avg latency for op 4 (sync): 11818 us
[OSD 2] avg latency for op 12 (primary_write): 10995 us, B/W: 12.06 MB/s
[OSD 2] avg latency for subop 3 (write_stable): 758 us
[OSD 2] avg latency for subop 4 (sync): 1161 us
Ran out of journal space (used_start=00a9b000, next_free=00a76000, dirty_start=00a63000)
Ran out of journal space (used_start=00745000, next_free=00723000, dirty_start=006d4000)
[OSD 2] avg latency for op 1 (read): 17 us, B/W: 0.00 KB/s
[OSD 2] avg latency for op 3 (write_stable): 2601 us, B/W: 13.70 MB/s
[OSD 2] avg latency for op 4 (sync): 13053 us
[OSD 2] avg latency for op 12 (primary_write): 11119 us, B/W: 11.97 MB/s
[OSD 2] avg latency for subop 3 (write_stable): 734 us
[OSD 2] avg latency for subop 4 (sync): 935 us
[OSD 2] avg latency for subop 15 (ping): 349 us
Ran out of journal space (used_start=01b89000, next_free=01b67000, dirty_start=01b20000)
Ran out of journal space (used_start=00ea1000, next_free=00e7f000, dirty_start=00dee000)
[OSD 2] avg latency for op 1 (read): 20 us, B/W: 0.00 KB/s
[OSD 2] avg latency for op 3 (write_stable): 926 us, B/W: 29.62 MB/s
[OSD 2] avg latency for op 4 (sync): 3241 us
[OSD 2] avg latency for op 12 (primary_write): 5347 us, B/W: 25.24 MB/s
[OSD 2] avg latency for subop 3 (write_stable): 886 us
[OSD 2] avg latency for subop 4 (sync): 1211 us
[OSD 2] avg latency for subop 15 (ping): 326 us

Message from syslogd@bmhv7 at Mar 21 03:06:19 ...
 kernel:[19704.166683] NMI watchdog: Watchdog detected hard LOCKUP on cpu 25

Message from syslogd@bmhv7 at Mar 21 03:06:19 ...
 kernel:[19708.518669] watchdog: BUG: soft lockup - CPU#16 stuck for 26s! [kworker/16:6:158772]

Message from syslogd@bmhv7 at Mar 21 03:06:19 ...
 kernel:[19708.538669] watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [runc:803276]

Message from syslogd@bmhv7 at Mar 21 03:06:23 ...
 kernel:[19712.478649] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:22]

Message from syslogd@bmhv7 at Mar 21 03:06:23 ...
 kernel:[19712.506649] watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [virt-handler:5247]

Message from syslogd@bmhv7 at Mar 21 03:06:23 ...
 kernel:[19712.510649] watchdog: BUG: soft lockup - CPU#12 stuck for 22s! [vitastor-nbd:161338]

Message from syslogd@bmhv7 at Mar 21 03:06:23 ...
 kernel:[19712.518649] watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [mongo:803283]

Message from syslogd@bmhv7 at Mar 21 03:06:23 ...
 kernel:[19712.530648] watchdog: BUG: soft lockup - CPU#20 stuck for 22s! [kworker/20:6:161420]
vitalif commented 1 year ago

"Ran out of journal space" is fine. Nothing more in OSD logs? dmesg? And something hangs?

rayjanoka commented 1 year ago

I collected the logs from all of the nodes. Seems like everything lost connection to OSD2 first. Let me know what you think.

OSD: https://gist.github.com/rayjanoka/141e6617190e3c537a2600c6547b864d

journal: https://gist.github.com/rayjanoka/82a094110e1de07d75b5a961f1ee0d02

rayjanoka commented 1 year ago

I upgraded to Ubuntu 22.04 and built from source the same way I was doing it and it looks like the issue is resolved!

vitalif commented 1 year ago

Yeah, OSD 2 hung first. I'm not sure about "has_unclean", "unclean" objects should have been resolved even with 1 failed OSD and it should've continued operating in degraded mode... But I'm glad to hear that the lockup itself is resolved after updating, seems that 5.15.0-67-generic was just buggy :-)

Mar 21 15:47:00 bmhv9 kernel: NMI watchdog: Watchdog detected hard LOCKUP on cpu 26
Mar 21 15:47:00 bmhv9 kernel: Call Trace:
Mar 21 15:47:00 bmhv9 kernel:  <IRQ>
Mar 21 15:47:00 bmhv9 kernel:  _raw_spin_lock+0x22/0x30
Mar 21 15:47:00 bmhv9 kernel:  take_dentry_name_snapshot+0x20/0x80
Mar 21 15:47:00 bmhv9 kernel:  __fsnotify_parent+0x1d6/0x310
Mar 21 15:47:00 bmhv9 kernel:  ? native_smp_send_reschedule+0x2a/0x50
Mar 21 15:47:00 bmhv9 kernel:  ? mempool_free_slab+0x17/0x20
Mar 21 15:47:00 bmhv9 kernel:  __io_complete_rw_common+0x132/0x180
Mar 21 15:47:00 bmhv9 kernel:  ? __io_complete_rw_common+0x132/0x180
Mar 21 15:47:00 bmhv9 kernel:  io_complete_rw+0x17/0x50
Mar 21 15:47:00 bmhv9 kernel:  blkdev_bio_end_io+0x8c/0x110
Mar 21 15:47:00 bmhv9 kernel:  bio_endio+0x106/0x170
Mar 21 15:47:00 bmhv9 kernel:  blk_update_request+0x249/0x4d0
Mar 21 15:47:00 bmhv9 kernel:  blk_mq_end_request+0x21/0x140
Mar 21 15:47:00 bmhv9 kernel:  nvme_complete_rq+0x80/0x260 [nvme_core]
Mar 21 15:47:00 bmhv9 kernel:  nvme_pci_complete_rq+0x82/0xa0 [nvme]
Mar 21 15:47:00 bmhv9 kernel:  nvme_irq+0x15f/0x250 [nvme]
Mar 21 15:47:00 bmhv9 kernel:  __handle_irq_event_percpu+0x42/0x170
Mar 21 15:47:00 bmhv9 kernel:  handle_irq_event_percpu+0x33/0x80
Mar 21 15:47:00 bmhv9 kernel:  handle_irq_event+0x3b/0x60
Mar 21 15:47:00 bmhv9 kernel:  handle_edge_irq+0x93/0x1c0
Mar 21 15:47:00 bmhv9 kernel:  __common_interrupt+0x52/0xe0
Mar 21 15:47:00 bmhv9 kernel:  common_interrupt+0x89/0xa0
Mar 21 15:47:00 bmhv9 kernel:  </IRQ>
rayjanoka commented 1 year ago

still having a similar issue.. here is a new crash on a 6.0.1 kernel.


Mar 25 04:45:53 bmhv9 kernel: nbd0: detected capacity change from 0 to 20971520
Mar 25 04:45:53 bmhv9 kernel: EXT4-fs (nbd0): mounted filesystem with ordered data mode. Quota mode: none.
Mar 25 04:45:54 bmhv9 systemd-udevd[6086]: Using default interface naming scheme 'v249'.
Mar 25 04:45:54 bmhv9 networkd-dispatcher[1209]: WARNING:Unknown index 15 seen, reloading interface list
Mar 25 04:45:54 bmhv9 systemd-udevd[6131]: Using default interface naming scheme 'v249'.
Mar 25 04:45:54 bmhv9 systemd-networkd[1163]: lxc547e8e7a8b0e: Link UP
Mar 25 04:45:54 bmhv9 kernel: eth0: renamed from tmp49987
Mar 25 04:45:54 bmhv9 systemd-networkd[1163]: lxc547e8e7a8b0e: Gained carrier
Mar 25 04:45:54 bmhv9 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Mar 25 04:45:54 bmhv9 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): lxc547e8e7a8b0e: link becomes ready
Mar 25 04:45:54 bmhv9 networkd-dispatcher[1209]: ERROR:Unknown interface index 15 seen even after reload
Mar 25 04:45:54 bmhv9 networkd-dispatcher[1209]: WARNING:Unknown index 15 seen, reloading interface list
Mar 25 04:45:54 bmhv9 networkd-dispatcher[1209]: ERROR:Unknown interface index 15 seen even after reload
Mar 25 04:45:54 bmhv9 networkd-dispatcher[1209]: WARNING:Unknown index 15 seen, reloading interface list
Mar 25 04:45:54 bmhv9 networkd-dispatcher[1209]: ERROR:Unknown interface index 15 seen even after reload
Mar 25 04:45:54 bmhv9 systemd[1]: Started libcontainer container 499874f395d0fcd0d450a9d8e00e4edd80e2c73e0fee338f7c123381ce45e388.
Mar 25 04:45:55 bmhv9 systemd-networkd[1163]: lxc547e8e7a8b0e: Gained IPv6LL
Mar 25 04:45:56 bmhv9 systemd[1]: Started libcontainer container d64c7126e869c0eef9435a46e28ec7538c6dead8250634ea0572f09e4f1cc9fd.
Mar 25 04:46:11 bmhv9 node[5759]: Moving pool 1 (testpool) PG 88 primary OSD from 3 to 4
Mar 25 04:46:11 bmhv9 node[5759]: Moving pool 1 (testpool) PG 90 primary OSD from 3 to 4
Mar 25 04:46:11 bmhv9 node[5759]: Moving pool 1 (testpool) PG 95 primary OSD from 3 to 2
Mar 25 04:46:11 bmhv9 node[5759]: Moving pool 1 (testpool) PG 96 primary OSD from 3 to 2
Mar 25 04:46:11 bmhv9 node[5759]: Moving pool 1 (testpool) PG 97 primary OSD from 3 to 4
Mar 25 04:46:11 bmhv9 node[5759]: Moving pool 1 (testpool) PG 98 primary OSD from 3 to 4
Mar 25 04:46:11 bmhv9 node[5759]: Moving pool 1 (testpool) PG 100 primary OSD from 3 to 2
Mar 25 04:46:11 bmhv9 node[5759]: Moving pool 1 (testpool) PG 101 primary OSD from 3 to 2
...
Mar 25 04:46:11 bmhv9 node[5759]: Moving pool 1 (testpool) PG 234 primary OSD from 3 to 2
Mar 25 04:46:11 bmhv9 node[5759]: Moving pool 1 (testpool) PG 238 primary OSD from 3 to 2
Mar 25 04:46:11 bmhv9 node[5759]: Moving pool 1 (testpool) PG 239 primary OSD from 3 to 2
Mar 25 04:46:11 bmhv9 node[5759]: Moving pool 1 (testpool) PG 248 primary OSD from 3 to 2
Mar 25 04:46:11 bmhv9 node[5759]: Moving pool 1 (testpool) PG 251 primary OSD from 3 to 2
Mar 25 04:46:11 bmhv9 node[5759]: Moving pool 1 (testpool) PG 252 primary OSD from 3 to 4
Mar 25 04:46:11 bmhv9 node[5759]: Moving pool 1 (testpool) PG 256 primary OSD from 3 to 2
Mar 25 04:46:11 bmhv9 node[5759]: PG configuration successfully changed
Mar 25 04:46:12 bmhv9 etcd[1343]: time="2023-03-25T04:46:12Z" level=warning msg="error reading websocket message: websocket: close 1006 (abnormal closure): unexpected EOF"
Mar 25 04:46:44 bmhv9 kernel: block nbd0: Connection timed out, retrying (1/1 alive)
Mar 25 04:46:44 bmhv9 kernel: block nbd0: Connection timed out, retrying (0/1 alive)
Mar 25 04:46:44 bmhv9 kernel: block nbd0: Receive control failed (result -32)
Mar 25 04:46:44 bmhv9 kernel: block nbd0: Connection timed out, retrying (0/1 alive)
Mar 25 04:46:44 bmhv9 kernel: block nbd0: shutting down sockets
Mar 25 04:46:44 bmhv9 kernel: block nbd0: Connection timed out, retrying (0/1 alive)
Mar 25 04:46:44 bmhv9 kernel: block nbd0: Connection timed out, retrying (0/1 alive)
Mar 25 04:46:44 bmhv9 kernel: I/O error, dev nbd0, sector 196872 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2
Mar 25 04:46:44 bmhv9 kernel: block nbd0: Connection timed out, retrying (0/1 alive)
Mar 25 04:46:44 bmhv9 kernel: EXT4-fs warning (device nbd0): ext4_end_bio:343: I/O error 10 writing to inode 12 starting block 24609)
Mar 25 04:46:44 bmhv9 kernel: block nbd0: Connection timed out, retrying (0/1 alive)
Mar 25 04:46:44 bmhv9 kernel: I/O error, dev nbd0, sector 196944 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2
Mar 25 04:46:44 bmhv9 kernel: buffer_io_error: 27781 callbacks suppressed
Mar 25 04:46:44 bmhv9 kernel: Buffer I/O error on device nbd0, logical block 24609
Mar 25 04:46:44 bmhv9 kernel: block nbd0: Connection timed out, retrying (0/1 alive)
Mar 25 04:46:44 bmhv9 kernel: EXT4-fs warning (device nbd0): ext4_end_bio:343: I/O error 10 writing to inode 12 starting block 24618)
Mar 25 04:46:44 bmhv9 kernel: block nbd0: Connection timed out, retrying (0/1 alive)
Mar 25 04:46:44 bmhv9 kernel: I/O error, dev nbd0, sector 197144 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2
Mar 25 04:46:44 bmhv9 kernel: Buffer I/O error on device nbd0, logical block 24618
Mar 25 04:46:44 bmhv9 kernel: block nbd0: Connection timed out, retrying (0/1 alive)
Mar 25 04:46:44 bmhv9 kernel: EXT4-fs warning (device nbd0): ext4_end_bio:343: I/O error 10 writing to inode 12 starting block 24643)
Mar 25 04:46:44 bmhv9 kernel: I/O error, dev nbd0, sector 41240 op 0x1:(WRITE) flags 0x0 phys_seg 512 prio class 2
Mar 25 04:46:44 bmhv9 kernel: BUG: kernel NULL pointer dereference, address: 00000000000000b8
Mar 25 04:46:44 bmhv9 kernel: #PF: supervisor read access in kernel mode
Mar 25 04:46:45 bmhv9 kernel: #PF: error_code(0x0000) - not-present page
<reboot>
vitalif commented 1 year ago

Hm. 1) can you show me other logs? 2) can you describe the whole test scenario so I can just reproduce it myself in VMs?

vitalif commented 1 year ago

Ok. I fixed this bug and I also fixed six other EC related bugs %))). Now my OSD killing test (tests/test_heal.sh) passes correctly with EC, it was often hanging previously and it even masked a bug in the write path. I'll build a new release soon, for now you can try to build it yourself from the master branch.

vitalif commented 1 year ago

Fixed in https://github.com/vitalif/vitastor/releases/tag/v0.8.7