rkd77 / elinks

Fork of elinks
Other
349 stars 38 forks source link

[ win64 ] ELinks: Socket exception #154

Closed mtatton closed 2 years ago

mtatton commented 2 years ago

Just for evidence. I'll try to search for the problem on my own. On Windows 64: MSYS_NT-10.0-19044 DESKTOP-DLB6K64 3.3.4-341.x86_64 2022-02-15 17:24 UTC x86_64 Msys
The first use case proceeds without troubles: ./src/elinks.exe --dump ./test/hello.html It seems like the browser is working. But ocasionally it gives:

ELinks: Socket exception

On very simple pages it won't. But on a bit longer / complex one it does.

The above test doesn't makes any problems on Linux x64.

https://asciinema.org/a/489608

mtatton commented 2 years ago

The Socket Exception si reproducible when build inside the docker environment provided by: https://github.com/rkd77/elinks/pull/155 The socket exception remains but the binary is faster built in linux docker than on my qemu windows x64 system.

mtatton commented 2 years ago

This took me ages to figure out. But finally I can debug the win64 binary on linux. The base for that is to cross-compile the binary using the: CFLAG = -gstabs ... not -g flag! Without that there are no symbols provided to the windbg program. And it's impossible to add breapoints according to function names. The session then can look something like this: $ winedbg src/elinks.exe Wine-dbg>b dns_exception Breakpoint 1 at 0x0000000000482835 dns_exception+0xc [socket.c:187] in elinks Wine-dbg>c ... the windbg will run new window and run elinks there. Afterwards when the use case happends it will stop with: Stopped on breakpoint 1 at 0x0000000000482835 dns_exception+0xc [socket.c:187] in elinks dns_exception () at socket.c:187 Unable to access file 'socket.c'

References: [1] https://stackoverflow.com/questions/39938253/how-to-properly-debug-a-cross-compiled-windows-code-on-linux [2] https://wiki.winehq.org/Wine_Developer%27s_Guide/Debugging_Wine#Directory_.26_source_file_manipulation

mtatton commented 2 years ago

ANOTHER USELESS NOTE ON PROGRESS:

It seems like first there is problem in additional dns lookup. At first try elinks will query and and call src/network/socket.c dns_found without exception. The next try will result in dns resolution retry and exception after default (3) times there is something wrong.

This is not confirmed because this happens even on accessing plain ip address.

Nexrt step is to debug the tcp socket connection by recompiling elinks with defined: src/network/socket.c:debug_transfer_log(char *data, int len)

From this it is possible the Socket exception is thrown when there is a keep-alive connection: CONNECTION: 127.0.0.1 GET / HTTP/1.1 Host: 127.0.0.1:9000 User-Agent: ELinks/0.16.GIT (textmode; Win32; 80x25-2) Accept: / Connection: Keep-Alive If-Modified-Since: Sun, 24 Apr 2022 14:13:32 GMT

This try:

190 static void
191 exception(struct socket *socket)
192 {
193 //socket->ops->retry(socket, connection_state(S_EXCEPT));
194 }

made the bug disappeared... that means the trouble is on socket connection retry. My opinion is the socket is somehow closed but the browser still expects it to be open. It's kind of hard to debut it because there is no properly working backtrace in windbg.

That brought me to src/network/connection.c retry_connection_socket There is definitely something weird about the socket. When disable the assert on socket everything works correctly.

static void
retry_connection_socket(struct socket socket, struct connection_state state) {
//assert(socket); retry_connection((struct connection
)socket->conn, state); }

No, that's not it

Let's examine what's wrong about the socket ....

RESULT: Trying to recompile with -g and retry with windbg--gdb ./src/elinks.exe

mtatton commented 2 years ago

The trouble is winedbg doesn't provide whole backtrace. And it's more difficult to guess where the exception comes from in the stack. This seems promissing: x86_64-w64-mingw32-gdb ./elinks.exe The winedbg --gdb ./src/elinks.exe doesn't read symbols on my system. That could be probably because the gdb is called from winedbg. I can add the source of main using: (gdb) list 341 main.c: No such file or directory. (gdb) dir main (gdb) list 341 if (!getuid() || !geteuid()) { 342 fprintf(stderr, "%s\n\n" 343 "Permission to run this program as root " 344 "user was disabled at compile time.\n\n", 345 full_static_version); 346 exit(-1); 347 } 348 } 349 #else 350 #define check_if_root()

and setting breakpoints works... (gdb) b main Breakpoint 1 at 0x4777f5: file main.c, line 355.

Now only resolve this: Don't know how to run. Try "help target".

mtatton commented 2 years ago

[ GDB WIN64 ]

This assumes the win64 executable is compiled with -g cflag.

Finally I found something that looks very promising... howto on curses debug, I'll just sligtly eidt it for win64 [1]

HOWTO DEBUG WIN64 ELINKS:

(I assume You work on X windows with at least 2 terminals [ SCR01 and SCR02 ] or GNU screen:

SCR01$ wineconsole Z:/usr/share/win64/gdbserver.exe localhost:12345 elinks.exe

New window with elinks will appear and elinks will open on default settings. It's win64 binary running under wine. There is de-facto no difference from stndard windows command window.

win64.gdb file content: target remote localhost:12345

SCR02$ x86_64-w64-mingw32-gdb -x win64.gdb

On SCR02 You'll connect to the running win64 elinks wine session.

Then read symbols from file

(gdb) file elinks.exe A program is being debugged already. Are you sure you want to change the file? (y or n) y Reading symbols from elinks.exe...done.

Then just read directory with the problematic function (e.g. src/network in my case) like this:

(gdb) dir network

Verify the source file can be read by gbd:

(gdb) list socket.c:10 5 #endif 6 7 #include 8 #include 9 #include <sys/types.h> 10 #ifdef HAVE_NETINET_IN_H 11 #include <netinet/in.h> / OS/2 needs this after sys/types.h / 12 #endif 13 #ifdef HAVE_SYS_SOCKET_H 14 #include <sys/socket.h> / OS/2 needs this after sys/types.h /

set breakpoint:

(gdb) b dns_exception Breakpoint 1 at 0x48296c: file socket.c, line 189.

and continue with program running (gdb) c Continuing.

simulate the problematic part

And there we go... we have the exception break and full stack:

Breakpoint 1, dns_exception (socket=0x8e84f8) at socket.c:189 189 { (gdb) bt

0 dns_exception (socket=0x8e84f8) at socket.c:189

1 0x0000000000478bcc in select_loop (init=0x47706b ) at select.c:628

2 0x000000000047784b in main (argc=1, argv=0x831a40) at main.c:364

... now just find what's cousing the Socket exception...

References: [1] https://stackoverflow.com/questions/11965328/debugging-ncurses-application-with-gdb

mtatton commented 2 years ago

It seems like the trouble is after the connection is tried past dns_found. image

mtatton commented 2 years ago

[ HTONS and connect ] After some research I now where the trouble is. The cause of the Socket Exception is 2nd retry of connection. I can now reproduce the trouble. In current source code state:

e1aa3fe5177dafe7cd5815f39258e2cf9afeb5eb

The sources compiled for win64 under docker image using:

root@9689e1781e84:/elinks# x86_64-w64-mingw32-gcc --version x86_64-w64-mingw32-gcc --version x86_64-w64-mingw32-gcc (GCC) 10-win32 20210110

ii gcc-mingw-w64-x86-64-win32 10.2.1-6+24.2 amd64 GNU C compiler for MinGW-w64, Win64/Win32

has problem when reconnecting to the same server for 2nd and more time.

The very place where this happens is function:

connect_socket (csocket=0x901b08, state=...) at socket.c:705

at line:

            addr.sin_port = htons(connect_info->port);

on that line the connect_socket thread fails and the stack is corrupted. Because it's a thread elinks continue running but the connection fails.

Next steps is more exploration on what's going on and why the first connection succeeds and the next fails.

mtatton commented 2 years ago

First try to replace htons with 0x5000 fails several lines after on connect: image vs original state: image where the thread is corrupted right on htons.

mtatton commented 2 years ago

Differences in the first and next connection image

mtatton commented 2 years ago

It seems like I get: WSAEWOULDBLOCK (10035) Resource temporarily unavailable. on reload

mtatton commented 2 years ago

[ INTERMEDIATE RESULT ]

The problem is in the part where socket is set to nonblocking. After temporarily disabling setting non-blocking socket:

// if (set_nonblocking_fd(sock) < 0) {

and setting the socket non-blocking ioctlsocket:

u_long mode = 1; // non-blocking socket ioctlsocket(sock, FIONBIO, &mode);

pages are reloading and opening without complaints...

Proposed next steps: Verify the non-blocking win32 winsock implementation in elinks in function set_nonblocking_fd()

mtatton commented 2 years ago

Solved by: https://github.com/rkd77/elinks/pull/156/commits/eff70d3807c694ff7faae78117ebf66cded852eb