Closed palainp closed 3 years ago
It seems to be an issue with the connect
mirage ip stack and not the direct
one (not sure about the terminology, I named them after the log line: INF [tcpip-stack-socket] IPv4 socket stack: connect
and the --net=
option of mirage configure). The pb does not occurs with hvt and not with the following:
mirage configure -t unix --net=direct && make depends && make && \
sudo ./mirage_sshfs --ipv4=10.0.0.2/24 -l debug
Then in another terminal:
sudo ip addr add dev tap0 10.0.0.1/24 && \
sshfs mirage@10.0.0.2:/ ~/tmp/sshfs/ -p 18022 -o IdentityFile=/home/user/mirage-sshfs/mirage && \
echo ok > ~/tmp/sshfs/test # is ok
I'm not sure where it's better to investigate, but when we need root permissions to add a tap interface and configure it with an IP address it's not as usable as just running the binary :)
could you briefly elaborate what happens on the unikernel side?
Why you are seeing the errors with [tcpip-stack-socket]
prefixed is because that Log source is used -- the source of that code is in https://github.com/mirage/mirage-tcpip/blob/v6.2.0/src/stack-unix/tcpip_stack_socket.ml#L95-L108 -- where the listen_tcpv4
is implemented: after accept
is called, a new task is created (Lwt.async) and the callback is executed in that with an exception handler --> this means that the callback (you provided) raised an exception.
It is not clear why you receive a read error and why this only happens with the socket stack.
The exception is raised by the guard call here https://github.com/mirage/awa-ssh/blob/b5026fe4afb65b8c363db828e2510ca9d6df3a16/lib/packet.ml#L55 .
The unikernel seems to run fine for a moment ant after a while the guard's condition is triggered, for example here is an execution, on the second terminal I just mount the sshfs and do some file/directory manipulations (I have not found a command that reliabily causes the stack corruption, EDIT: In fact I don't think that the stack is corrupted, but Awa is lost in the communication):
[user@localhost mirage-sshfs]$ ./mirage_sshfs
2021-10-05 13:56:57 +02:00: INF [tcpip-stack-socket] IPv4 socket stack: connect
2021-10-05 13:56:57 +02:00: INF [sshfs_server] SSHFS server waiting connections on port 18022
2021-10-05 14:04:25 +02:00: INF [sshfs_server] [127.0.0.1] connected
shared is 56 9b e3 4b 62 68 fb b9 27 c1 3f 09 ba 5b e6 f4
85 bc 0d dd df 6b 14 57 60 ec 37 0e b4 4a 30 ea
2a 49 25 b7 68 8d 8a 78 cb c5 52 bf 30 92 4f 04
03 33 37 a8 94 91 50 a6 c5 2b 49 1d ca ae 61 e3
b9 50 fb 45 34 6c 24 f3 5a 04 38 b0 53 a3 94 22
74 f2 5a 66 0c 6b 35 27 a7 7b c0 01 71 68 9a b0
d1 b6 43 57 cd 53 26 fd fe 5f a4 d2 0d b2 cc 92
64 83 87 6d a7 40 3f 6c 23 da e4 74 a2 8c 31 67
f3 2e 2b c1 26 bb be 5e 49 fc 09 0c 41 8c c3 c8
ed 92 82 2a 7a c8 7a 8b d1 d3 8b 37 60 5b 02 64
47 d9 80 bc 5b 9b f7 16 ca a9 91 ee 01 69 a0 03
9a c0 97 7e 82 06 a5 6e 8e 58 b1 2d e2 83 db 27
24 98 41 a8 96 c7 f5 67 78 c4 51 05 a8 9a 96 42
79 56 54 7e 0c 3d 0f 3a 18 ae 75 70 29 79 93 9b
a8 cc 7b 8f f9 41 e3 f7 da 51 5c 26 6a e2 ca 3e
47 30 6b 1e c2 c1 46 ed bf 42 38 6b 62 6b da 9e
signature is 68 9b b9 4e d7 d6 3c 75 e0 f6 11 29 74 d5 da cb
49 64 07 24 34 75 b8 aa 6d 52 df 02 1a a8 ce 78
82 91 48 c3 1c 59 3e e7 68 a4 eb 2e 82 a0 8d 44
25 46 77 95 ef c4 dd 51 89 d8 ee 8b fa 2d d8 07
(hash dc 63 f8 71 c1 62 a4 dd f9 f9 b6 21 79 ce 21 93
fe e8 27 2e 89 d2 4c 94 c2 5c 9c af 1f da 8a a6
)
2021-10-05 14:04:25 +02:00: INF [sshfs_server] [127.0.0.1] executing `sftp`
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:04:25 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:06:36 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:06:37 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:06:37 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:06:37 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:06:37 +02:00: ERR [application] read error Lwt.Resolution_loop.Canceled
2021-10-05 14:06:37 +02:00: WRN [tcpip-stack-socket] error (Invalid_argument "decrypt: Bogus pkt len") in callback
The loop being canceled and the callback error seems to be uncorrelated. I have none of these two messages with hvt.
I think that I found the reason behind the read error
in https://github.com/mirage/awa-ssh/pull/35#issuecomment-955444805
When there is a solution for the loop cancellation problem, I'll try to find out if this is the same reason that corrupts the communication data.
The Bogus pkt len
occurs when the network interface read is canceled in an awa mirage. It will be fixed there.
When launching the unix binary we have some
ERR [application] read error Lwt.Resolution_loop.Canceled
then a singleWRN [tcpip-stack-socket] error (Invalid_argument "decrypt: Bogus pkt len") in callback
. The last message is from awa-ssh inawa-ssh/lib/packet.ml
andpkt_len
seems to be complete garbage.