palainp / mirage-sshfs

ISC License
21 stars 1 forks source link

Unix connexion failure #3

Closed palainp closed 3 years ago

palainp commented 3 years ago

When launching the unix binary we have some ERR [application] read error Lwt.Resolution_loop.Canceled then a single WRN [tcpip-stack-socket] error (Invalid_argument "decrypt: Bogus pkt len") in callback. The last message is from awa-ssh in awa-ssh/lib/packet.ml and pkt_len seems to be complete garbage.

palainp commented 3 years ago

It seems to be an issue with the connect mirage ip stack and not the direct one (not sure about the terminology, I named them after the log line: INF [tcpip-stack-socket] IPv4 socket stack: connect and the --net= option of mirage configure). The pb does not occurs with hvt and not with the following:

mirage configure -t unix --net=direct && make depends && make && \
sudo ./mirage_sshfs --ipv4=10.0.0.2/24 -l debug

Then in another terminal:

sudo ip addr add dev tap0 10.0.0.1/24 && \
sshfs mirage@10.0.0.2:/ ~/tmp/sshfs/ -p 18022 -o IdentityFile=/home/user/mirage-sshfs/mirage && \
echo ok > ~/tmp/sshfs/test # is ok

I'm not sure where it's better to investigate, but when we need root permissions to add a tap interface and configure it with an IP address it's not as usable as just running the binary :)

hannesm commented 3 years ago

could you briefly elaborate what happens on the unikernel side?

Why you are seeing the errors with [tcpip-stack-socket] prefixed is because that Log source is used -- the source of that code is in https://github.com/mirage/mirage-tcpip/blob/v6.2.0/src/stack-unix/tcpip_stack_socket.ml#L95-L108 -- where the listen_tcpv4 is implemented: after accept is called, a new task is created (Lwt.async) and the callback is executed in that with an exception handler --> this means that the callback (you provided) raised an exception.

It is not clear why you receive a read error and why this only happens with the socket stack.

palainp commented 3 years ago

The exception is raised by the guard call here https://github.com/mirage/awa-ssh/blob/b5026fe4afb65b8c363db828e2510ca9d6df3a16/lib/packet.ml#L55 .

The unikernel seems to run fine for a moment ant after a while the guard's condition is triggered, for example here is an execution, on the second terminal I just mount the sshfs and do some file/directory manipulations (I have not found a command that reliabily causes the stack corruption, EDIT: In fact I don't think that the stack is corrupted, but Awa is lost in the communication):

[user@localhost mirage-sshfs]$ ./mirage_sshfs
2021-10-05 13:56:57 +02:00: INF [tcpip-stack-socket] IPv4 socket stack: connect
2021-10-05 13:56:57 +02:00: INF [sshfs_server] SSHFS server waiting connections on port 18022

2021-10-05 14:04:25 +02:00: INF [sshfs_server] [127.0.0.1] connected

shared is 56 9b e3 4b 62 68 fb b9  27 c1 3f 09 ba 5b e6 f4
          85 bc 0d dd df 6b 14 57  60 ec 37 0e b4 4a 30 ea
          2a 49 25 b7 68 8d 8a 78  cb c5 52 bf 30 92 4f 04
          03 33 37 a8 94 91 50 a6  c5 2b 49 1d ca ae 61 e3
          b9 50 fb 45 34 6c 24 f3  5a 04 38 b0 53 a3 94 22
          74 f2 5a 66 0c 6b 35 27  a7 7b c0 01 71 68 9a b0
          d1 b6 43 57 cd 53 26 fd  fe 5f a4 d2 0d b2 cc 92
          64 83 87 6d a7 40 3f 6c  23 da e4 74 a2 8c 31 67
          f3 2e 2b c1 26 bb be 5e  49 fc 09 0c 41 8c c3 c8
          ed 92 82 2a 7a c8 7a 8b  d1 d3 8b 37 60 5b 02 64
          47 d9 80 bc 5b 9b f7 16  ca a9 91 ee 01 69 a0 03
          9a c0 97 7e 82 06 a5 6e  8e 58 b1 2d e2 83 db 27
          24 98 41 a8 96 c7 f5 67  78 c4 51 05 a8 9a 96 42
          79 56 54 7e 0c 3d 0f 3a  18 ae 75 70 29 79 93 9b
          a8 cc 7b 8f f9 41 e3 f7  da 51 5c 26 6a e2 ca 3e
          47 30 6b 1e c2 c1 46 ed  bf 42 38 6b 62 6b da 9e
           signature is 68 9b b9 4e d7 d6 3c 75  e0 f6 11 29 74 d5 da cb
                        49 64 07 24 34 75 b8 aa  6d 52 df 02 1a a8 ce 78
                        82 91 48 c3 1c 59 3e e7  68 a4 eb 2e 82 a0 8d 44
                        25 46 77 95 ef c4 dd 51  89 d8 ee 8b fa 2d d8 07
                         (hash dc 63 f8 71 c1 62 a4 dd  f9 f9 b6 21 79 ce 21 93
                               fe e8 27 2e 89 d2 4c 94  c2 5c 9c af 1f da 8a a6
                               )
2021-10-05 14:04:25 +02:00: INF [sshfs_server] [127.0.0.1] executing `sftp`

2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:06:36 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:06:37 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:06:37 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:06:37 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:06:37 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:06:37 +02:00: WRN [tcpip-stack-socket] error (Invalid_argument "decrypt: Bogus pkt len") in callback

The loop being canceled and the callback error seems to be uncorrelated. I have none of these two messages with hvt.

palainp commented 3 years ago

I think that I found the reason behind the read error in https://github.com/mirage/awa-ssh/pull/35#issuecomment-955444805 When there is a solution for the loop cancellation problem, I'll try to find out if this is the same reason that corrupts the communication data.

palainp commented 3 years ago

The Bogus pkt len occurs when the network interface read is canceled in an awa mirage. It will be fixed there.