vanvanj / distcc

Automatically exported from code.google.com/p/distcc
GNU General Public License v2.0
0 stars 0 forks source link

server closes socket prematurely #154

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
1. Version: distcc-3.1

2. Server: Mac OS X 10.5 Intel; Client: Mac OS X 10.4 PowerPC

3. Scenario: Compiling emacs-25.4 using distcc

4. Problem: When the result .o sent from the server to the client is big 
(around 200KB or more), the server closes the socket before the client has 
completely received the data. This results on client waiting forever for that 
result to arrive.

5. How to reproduce:
On server:
PATH=<compilers> distcc --daemon --allow 0.0.0.0/0 --no-detach --log-stderr 
--verbose

On client:
export PATH=<masquerading directory>
export DISTCC_HOSTS="server"
export DISTCC_VERBOSE=1
cd <emacs-25.4 configured build directory>
make

6. Error log:
On server:
[...]
distccd[10333] (dcc_collect_child) cc child 10409 terminated with status 0
distccd[10333] (dcc_collect_child) cc times: user 0.000000s, system 0.000000s, 
0 minflt, 0 majflt
distccd[10333] (dcc_x_token_int) send DONE00000001
distccd[10333] (dcc_x_token_int) send STAT00000000
distccd[10333] (dcc_x_file) send 0 byte file 
/var/folders/LF/LFXI49A0G+W7Zb3DgiVltk+++TI/-Tmp-//distcc_46d11641.stderr with 
token SERR and compression 69
distccd[10333] (dcc_x_token_int) send SERR00000000
distccd[10333] (dcc_x_file) send 0 byte file 
/var/folders/LF/LFXI49A0G+W7Zb3DgiVltk+++TI/-Tmp-//distcc_41901641.stdout with 
token SOUT and compression 69
distccd[10333] (dcc_x_token_int) send SOUT00000000
distccd[10333] (dcc_x_file) send 335832 byte file 
/var/folders/LF/LFXI49A0G+W7Zb3DgiVltk+++TI/-Tmp-//distccd_1f841641.o with 
token DOTO and compression 69
distccd[10333] (dcc_x_token_int) send DOTO00051fd8
distccd[10333] gcc sha512.c on localhost completed ok
distccd[10333] job complete
distccd[10333] (dcc_cleanup_tempfiles_inner) deleted 5 temporary files
distccd[10333] (dcc_job_summary) client: 10.0.2.2:49339 COMPILE_OK exit:0 sig:0 
core:0 ret:0 time:6446ms gcc sha512.c
[no more messages because client does not send further requests]

On client:
[...]
distcc[28558] (dcc_trace_version) distcc 3.1 powerpc-apple-darwin8.11.0; built 
May 26 2015 11:39:08
distcc[28558] (dcc_recursion_safeguard) safeguard level=0
distcc[28558] (main) compiler name is "gcc"
distcc[28558] (dcc_set_path) setting 
PATH=/Users/csanchez/local/powerpc-apple-darwin8.11.0/bin:/usr/bin:/bin:/usr/sbi
n:/sbin:/usr/X11R6/bin
distcc[28558] (dcc_scan_args) scanning arguments: gcc -std=gnu99 
-DHAVE_CONFIG_H -I. -I../lib -I../src -I../src 
-I/Users/csanchez/local/powerpc-apple-darwin8.11.0/include -g3 -O2 -MT sha512.o 
-MD -MP -MF .deps/sha512.Tpo -c -o sha512.o sha512.c
distcc[28558] (dcc_scan_args) found object/output file "sha512.o"
distcc[28558] (dcc_scan_args) found input file "sha512.c"
distcc[28558] compile from sha512.c to sha512.o
distcc[28558] (dcc_get_hostlist) read hosts from environment
distcc[28558] (dcc_parse_hosts) found tcp token "boole"
distcc[28558] (dcc_lock_host) got cpu lock on boole slot 0 as fd5
distcc[28558] (dcc_lock_host) got cpu lock on localhost slot 0 as fd7
distcc[28558] (dcc_note_state) note state 3, file "(NULL)", host "localhost"
distcc[28558] (dcc_strip_dasho) result: gcc -std=gnu99 -DHAVE_CONFIG_H -I. 
-I../lib -I../src -I../src 
-I/Users/csanchez/local/powerpc-apple-darwin8.11.0/include -g3 -O2 -MT sha512.o 
-MD -MP -MF .deps/sha512.Tpo -c sha512.c
distcc[28558] (dcc_spawn_child) forking to execute: gcc -std=gnu99 
-DHAVE_CONFIG_H -I. -I../lib -I../src -I../src 
-I/Users/csanchez/local/powerpc-apple-darwin8.11.0/include -g3 -O2 -MT sha512.o 
-MD -MP -MF .deps/sha512.Tpo -E sha512.c
distcc[28558] (dcc_spawn_child) child started as pid28559
distcc[28558] (dcc_strip_local_args) result: gcc -std=gnu99 -g3 -O2 -c -o 
sha512.o sha512.c
distcc[28558] exec on boole: gcc -std=gnu99 -g3 -O2 -c -o sha512.o sha512.c
distcc[28558] (dcc_note_state) note state 2, file "sha512.c", host "boole"
distcc[28559] (dcc_new_pgrp) entered process group
distcc[28559] (dcc_increment_safeguard) setting safeguard: _DISTCC_SAFEGUARD=1
distcc[28558] (dcc_connect_by_addr) started connecting to 192.168.3.188:3632
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_note_state) note state 4, file "(NULL)", host "(NULL)"
distcc[28558] (dcc_x_token_int) send DIST00000001
distcc[28558] (dcc_x_token_int) send ARGC00000008
distcc[28558] (dcc_x_token_int) send ARGV00000003
distcc[28558] (dcc_x_token_string) send string 'gcc'
distcc[28558] (dcc_x_token_int) send ARGV0000000a
distcc[28558] (dcc_x_token_string) send string '-std=gnu99'
distcc[28558] (dcc_x_token_int) send ARGV00000003
distcc[28558] (dcc_x_token_string) send string '-g3'
distcc[28558] (dcc_x_token_int) send ARGV00000003
distcc[28558] (dcc_x_token_string) send string '-O2'
distcc[28558] (dcc_x_token_int) send ARGV00000002
distcc[28558] (dcc_x_token_string) send string '-c'
distcc[28558] (dcc_x_token_int) send ARGV00000002
distcc[28558] (dcc_x_token_string) send string '-o'
distcc[28558] (dcc_x_token_int) send ARGV00000008
distcc[28558] (dcc_x_token_string) send string 'sha512.o'
distcc[28558] (dcc_x_token_int) send ARGV00000008
distcc[28558] (dcc_x_token_string) send string 'sha512.c'
distcc[28558] (dcc_note_state) note state 3, file "(NULL)", host "(NULL)"
distcc[28558] (dcc_collect_child) cpp child 28559 terminated with status 0
distcc[28558] (dcc_collect_child) cpp times: user 0.000000s, system 0.000000s, 
0 minflt, 0 majflt
distcc[28558] cpp sha512.c on localhost completed ok
distcc[28558] (dcc_unlock) release lock fd7
distcc[28558] (dcc_x_file) send 173103 byte file /tmp/distcc_aac91852.i with 
token DOTI and compression 69
distcc[28558] (dcc_x_token_int) send DOTI0002a42f
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_select_for_write) select for write on fd8
distcc[28558] (dcc_compile_remote) client finished sending request to server
distcc[28558] (dcc_note_state) note state 5, file "(NULL)", host "boole"
distcc[28558] (dcc_select_for_read) select for read on fd8 for 300s
distcc[28558] (dcc_r_token_int) got DONE00000001
distcc[28558] (dcc_note_state) note state 6, file "(NULL)", host "(NULL)"
distcc[28558] (dcc_r_token_int) got STAT00000000
distcc[28558] (dcc_r_token_int) got SERR00000000
distcc[28558] (dcc_r_file) received 0 bytes to file 
/tmp/distcc_server_stderr_bd361852.txt
distcc[28558] (dcc_r_token_int) got SOUT00000000
distcc[28558] (dcc_r_token_int) got DOTO00051fd8
distcc[28558] (dcc_select_for_read) select for read on fd8 for 300s
distcc[28558] (dcc_select_for_read) select for read on fd8 for 300s
distcc[28558] (dcc_select_for_read) select for read on fd8 for 300s
distcc[28558] (dcc_select_for_read) select for read on fd8 for 300s
[this goes on forever]

7. Discussion:
After a little debugging, I found server closes the socket immediately after 
sending the result file (i.e. after "write" call returns). This is valid for 
glibc [1], but general POSIX might cause data loss [2]. Attached patch casuses 
the server to wait for the client to close its socket before doing the same.

A more elaborated mechanism using shutdown could be used, but it is not needed 
in this case as we know the last transmission is always server->client, so the 
client can forcefully close once it has all the data.

[1] http://www.gnu.org/software/libc/manual/html_node/Closing-a-Socket.html: 
"If there is still data waiting to be transmitted over the connection, normally 
close tries to complete this transmission"
[2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/send.html: 
"Successful completion of a call to send() does not guarantee delivery of the 
message. A return value of -1 indicates only locally-detected errors."

Original issue reported on code.google.com by csanchez...@gmail.com on 5 Jun 2015 at 7:20

Attachments: