Open sv641 opened 2 years ago
Further debugging the snapshot recover with lots of help from @paul we found that when socket is in
if (nt_sock->how == KM_FILE_HOW_ACCEPT) { return km_fs_recover_socket_accepted(nt_sock); }
static int km_fs_recover_socket_accepted(km_nt_socket_t* nt_sock) { return 0; }
In memory filesys entry is not being made and this is causing eventfd creation to bail out.
fd state of process being snapshoted lrwx------ 1 root root 64 Aug 18 18:35 0 -> /dev/null l-wx------ 1 root root 64 Aug 18 18:35 1 -> pipe:[110166] l-wx------ 1 root root 64 Aug 18 18:35 2 -> pipe:[110167] lrwx------ 1 root root 64 Aug 18 18:35 3 -> socket:[109236] lrwx------ 1 root root 64 Aug 18 18:35 5 -> socket:[109260] lrwx------ 1 root root 64 Aug 18 18:35 6 -> anon_inode:[eventpoll] lrwx------ 1 root root 64 Aug 18 18:38 729 -> socket:[114797] l-wx------ 1 root root 64 Aug 18 23:05 731 -> /tmp/km_1.log lrwx------ 1 root root 64 Aug 18 23:05 732 -> anon_inode:[eventfd] lrwx------ 1 root root 64 Aug 18 23:05 733 -> anon_inode:[eventfd] lrwx------ 1 root root 64 Aug 18 23:05 734 -> /dev/kvm lrwx------ 1 root root 64 Aug 18 23:05 735 -> anon_inode:kvm-vm lrwx------ 1 root root 64 Aug 18 23:05 736 -> anon_inode:kvm-vcpu:0 lrwx------ 1 root root 64 Aug 18 23:05 737 -> anon_inode:kvm-vcpu:1 lrwx------ 1 root root 64 Aug 18 23:05 738 -> anon_inode:kvm-vcpu:2 lrwx------ 1 root root 64 Aug 18 23:05 739 -> anon_inode:kvm-vcpu:3 lrwx------ 1 root root 64 Aug 18 23:05 740 -> anon_inode:kvm-vcpu:4 lrwx------ 1 root root 64 Aug 19 01:59 741 -> anon_inode:kvm-vcpu:5 lrwx------ 1 root root 64 Aug 19 01:59 742 -> anon_inode:kvm-vcpu:6 lrwx------ 1 root root 64 Aug 19 16:22 743 -> anon_inode:kvm-vcpu:7 lrwx------ 1 root root 64 Aug 19 16:22 744 -> anon_inode:kvm-vcpu:8
(gdb) p/x *$14 $17 = {nfdmap = 0x2d7, guest_files = 0x7ffff8000920} (gdb) p/x $17->guest_files[3] $18 = {inuse = 0x1, how = 0x4, flags = 0x0, error = 0x0, ops = 0x0, ofd = 0xffffffff, name = 0x7ffff801bd60, sockinfo = 0x7ffff8014c30, events = {tqh_first = 0x0, tqh_last = 0x7ffff8000a10}} (gdb) p/x $17->guest_files[5] $19 = {inuse = 0x0, how = 0x0, flags = 0x0, error = 0x0, ops = 0x0, ofd = 0x0, name = 0x0, sockinfo = 0x0, events = {tqh_first = 0x0, tqh_last = 0x0}} (gdb) b 2963 Breakpoint 2 at 0x7ffff7e8d9f2: file km/km_filesys.c, line 2963. (gdb) c Continuing.
Breakpoint 2, km_fs_recover_eventfd (ptr=0x7ffff80222a0 "\024", length=
In function km_fs_recover_open_socket(), this block of code needs to be removed:
if (nt_sock->how == KM_FILE_HOW_ACCEPT) {
return km_fs_recover_socket_accepted(nt_sock);
}
I think the above change will allow an accepted socket to be recovered in such a way that the first recv or send operation will return ECONNRESET which will cause the payload to abandon the connection and resume listening for a new connection.
The bats snapshot_test.c needs to test listening and connected sockets to be sure they are recovered properly too.
guest is a client and we are expecting the server to keep the session alive?
On Tue, Oct 25, 2022 at 3:24 PM paulpopelka @.***> wrote:
The bats snapshot_test.c needs to test listening and connected sockets to be sure they are recovered properly too.
— Reply to this email directly, view it on GitHub https://github.com/kontainapp/km/issues/1661#issuecomment-1291205715, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANVVZGELTLXQPGT5KKITVN3WFBMYNANCNFSM57BUJ2AA . You are receiving this because you authored the thread.Message ID: @.***>
guest is a client and we are expecting the server to keep the session alive?
The connection will be lost when the snapshot is recovered. When I/O is attempted on the fd for the lost connection km will cause ECONNRESET to be returned to the payload. We assume the payload will be able to handle this by cleaning up whatever it was trying to do on the connection and then reconnect to retry whatever it was doing.
I took a snapshot of go server provided by knative examples. snapshot created fails to start with error