Open rn opened 6 years ago
With #57 we see a bit more:
$ sock_stress -c hvsock://parent -e -v 1 -i 5 -B 8192 -b 8192
Client connecting to a42e7cda-d03f-480c-9cc2-a4de20abb878:3049197c-facb-11e6-bd58-64006a7986d3
2018/02/08 14:02:44 [00000] TX/RX: 53495 bytes in 4.6309 ms
2018/02/08 14:02:44 [00001] TX/RX: 47466 bytes in 3.8491 ms
2018/02/08 14:02:44 [00002] TX/RX: 29659 bytes in 1.6459 ms
2018/02/08 14:02:44 [00003] Failed to receive after 32768 of 64872 bytes: Error after reading 0 of 8192 bytes: EOF
So basically, half way through receiving 64872B from the host the read gets an EOF. The test run was donw with 8192B buffers but we see similar behaviour with the default 16kB buffers, but there it is more likely to be the last batch which fails. The individual batch failing always seems to fail with 0B read.
I get similar error when using a Win 10 Pro (Fall Creators Update) connecting to a Win 10 Pro host where the server runs:
PS> .\sock_stress.exe -v 1 -c hvsock://parent -i 10 -e
Client connecting to a42e7cda-d03f-480c-9cc2-a4de20abb878:3049197c-facb-11e6-bd58-64006a7986d3
2018/02/08 15:32:40 [00000] TX/RX: 17568 bytes in 3.9851 ms
2018/02/08 15:32:40 [00001] Failed to receive after 32768 of 61753 bytes: Error after reading 0 of 16384 bytes: The network connection was aborted by the local system.
With the C implementation in c/hvecho.c
we see the same issue. hvecho
has a server and client mode. The server:
for (;;) {
res = recv(buf);
if (res == 0)
break;
send(buf);
}
send("Bye");
close();
So basically, the server echo's data received back until the recv()
returns 0 indicating shutdown() and then sends a last message before closing.
The client does:
send(msg);
recv(msg);
shutdown(SD_SEND);
recv(msg);
ie after the it receives the echo calls shutdown()
and then expects a bye
message from the server.
With the server inside a Linux VM and the client on the Windows host, this works fine. Client output(Windows):
Connect to: 73cc71ad-491c-4c0a-af4e-042691ac1bf9:3049197c-facb-11e6-bd58-64006a7986d3
Bytes Sent: 14
Bytes received: 14
->this is a test
Shutdown
Wait for bye
Bytes received: 8
->Bye!
Server output (Linux):
Connect from: 0xffffffff.0x80000008
Bytes received: 14
Bytes sent: 14
Peer closed
Bye Bytes sent: 8
Running the server on Windows and the client on Linux we see: Client (Linux):
Connect to: 0x00000002.0x3049197c
Bytes Sent: 14
Bytes received: 14
->this is a test
Shutdown
Wait for bye
Connection closed
and on the server (windows):
Connect from: 599fed98-901f-4146-9467-6caf94c824a0:00000403-facb-11e6-bd58-64006a7986d3
Bytes received: 14
Bytes sent: 14
Peer closed
Bye Bytes sent: 8
Note that sometimes the Linux client does get the bye
message but sometimes it doesn't. So we see the same timing issue we see in the go test program.
I also tried to put a syscall.FlushFileBuffers(v.fd)
before calling syscall.close()
and it marginally improved thinks in that the client now sometimes seems to receive some more data, ie we now occasionally hit: Unexpected EOF after reading 4088 of 8192 bytes
...
When running the
sock_stress
tests with the server on the host (sock_stress -v 1 -s hvsock
) and the client in the Linux VMsock_stress -v 1 -c hvsock://parent
the client frequently gets errors on receive, like thisThe server does not report any errors and the test works fine in the other direction.
It seems timing sensitive as some connections work fine, even with a similar amount of data transmitted.
The server (on the host) is basically performing:
The client in one go-routine is sending random data and when all data is sent calls (
CloseWrite()
). In another go routine is it repeatably callingio.ReadAll()
of fixed sized chunks (min of buffer or data left to read). It's theio.ReadAll()
which sees the EOF before all data is read.