Optimization, general questions

jwt27 commented 1 year ago

Hi,

While working on the recent PRs I noticed there could be some room for optimization. I'm looking into that now but I had some questions:

Callbacks from the native to BSD layer have to iterate over the socket list and find the appropriate BSD socket. Seems it would be better to store a pointer from the native socket back to the BSD socket that owns it. But maybe there is some reason why you hadn't done that?

I do see for the ICMP callback:

https://github.com/gvanem/Watt-32/blob/70d84f2cda23dc970c6d3d51010bebee2f0ca248/src/socket.c#L1845

Is that still true? I don't see how it's possible. One BSD socket can only "own" one native socket, no?

I did see you had the idea once to store the Socket*s in a hash table. I finished the implementation for it now, but as long as there is so much iterating going on, it wouldn't be any faster.

Also I was looking into optimizing select_s()/poll(). I see select_s() has code to check DOS FDs too, but currently it will skip those and set errno = ENOTSOCK. So which is it? Should select_s() be able to work with DOS FDs? It would simplify things a bit if it didn't.

Edit: didn't see the check for SK_FIRST. So stdin/stdout/stderr should be allowed.

gvanem commented 1 year ago

But maybe there is some reason why you hadn't done that?

Perhaps. But like I wrote earlier, there is some years since I wrote that.

But I remember experimenting and porting various to Watt-32 programs (like ncat, netperf), and others involving PF_PACKET / SOCK_PACKET / SOCK_RAW sockets, I did these tweaks. Besides, there was not many sockets to iterate over. But perhaps you're right if Watt-32 was used for a Web-server with 5000 clients. 😮‍💨

BTW, currently, MAX_SOCKETS == 5000 for djgpp/Win-targets. Perhaps there should be some sysconf(OPEN_MAX) / getrlimit() function for this?

store a pointer from the native socket back to the BSD socket that owns it.

A native socket cannot store more than 1 pointer. The problem is that a native socket can be associated with several BSD-sockets:

https://github.com/gvanem/Watt-32/blob/70d84f2cda23dc970c6d3d51010bebee2f0ca248/src/ip4_in.c#L64-L75

The upper layer BSO_IP4_RAW (BSD-Socket Operation) handler, should tell all these SOCK_RAW sockets about an event.

The flow for letting SOCK_RAW socket(s) peek at the packet is something like:

_ip4_handler(), via (*_bsd_socket_hook()) calls socket_op_demux().
Then socket_op_demux(), via (*ip4_raw_hook) calls sock_raw4_recv().
Finally sock_raw4_recv() should do it's job for all SOCK_RAW sockets: https://github.com/gvanem/Watt-32/blob/70d84f2cda23dc970c6d3d51010bebee2f0ca248/src/socket.c#L931-L940

It's rather messy, I know. Even more so when it comes to select_s() and SOCK_RAW / SOCK_PACKET sockets. But things could improve.

And BTW, it took my brain almost an hour to dig up all these details after these years.

So stdin/stdout/stderr should be allowed.

Correct. Try it with:

cd tests
make djgpp.mak select.exe
py -3 -c "print('1'*100)" | select.exe -s 5000 -w 1 -d

Works kinda if you replace getch() with getchar(). But what if it created 5000 sockets first?

jwt27 commented 1 year ago

But maybe there is some reason why you hadn't done that?

Perhaps. But like I wrote earlier, there is some years since I wrote that.

But I remember experimenting and porting various to Watt-32 programs (like ncat, netperf), and others involving PF_PACKET / SOCK_PACKET / SOCK_RAW sockets, I did these tweaks. Besides, there was not many sockets to iterate over. But perhaps you're right if Watt-32 was used for a Web-server with 5000 clients.

BTW, currently, MAX_SOCKETS == 5000 for djgpp/Win-targets. Perhaps there should be some sysconf(OPEN_MAX) / getrlimit() function for this?

O(1) is still N times faster than O(N). But you're right, it does depend on scale.

store a pointer from the native socket back to the BSD socket that owns it.

A native socket cannot store more than 1 pointer. The problem is that a native socket can be associated with several BSD-sockets:

https://github.com/gvanem/Watt-32/blob/70d84f2cda23dc970c6d3d51010bebee2f0ca248/src/ip4_in.c#L64-L75

The upper layer BSO_IP4_RAW (BSD-Socket Operation) handler, should tell all these SOCK_RAW sockets about an event.

The flow for letting SOCK_RAW socket(s) peek at the packet is something like:

_ip4_handler(), via (*_bsd_socket_hook()) calls socket_op_demux().

Then socket_op_demux(), via (*ip4_raw_hook) calls sock_raw4_recv().

Finally sock_raw4_recv() should do it's job for all SOCK_RAW sockets: https://github.com/gvanem/Watt-32/blob/70d84f2cda23dc970c6d3d51010bebee2f0ca248/src/socket.c#L931-L940

It's rather messy, I know. Even more so when it comes to select_s() and SOCK_RAW / SOCK_PACKET sockets. But things could improve.

Somehow I was under the impression that it was the other way around (multiple native raw sockets per BSD socket). Don't know where I got that idea now. So there is one native raw socket that broadcasts to all raw BSD sockets? And for the more common case (TCP/UDP), there is always a 1:1 mapping from native to BSD, right?

Maybe ideally you'd want a hash table for TCP/UDP sockets and linked list for raw sockets. Now that gets messy.

gvanem commented 1 year ago

And for the more common case (TCP/UDP), there is always a 1:1 mapping from native to BSD, right?

AFAICR, for TCP yes. For UDP, I'm not so sure.

jwt27 commented 7 months ago

Was wondering, why does the BSD interface call tcp_Retransmitter (TRUE) everywhere? Is it just to poll for ARP lookups?

gvanem commented 7 months ago

Is it just to poll for ARP lookups?

It does a lot of things. First, when s->state == tcp_StateRESOLVE, the socket is in a pseudo state before SYN is sent via tcp_Retransmitter(). So if _arp_lookup() succeedes:

https://github.com/gvanem/Watt-32/blob/5363dff8874a8237134f3ea17206e4fdfc4db4e9/src/pctcp.c#L1239-L1245

the opening SYN gets sent.

Second, It also drains s->state == tcp_StateCLOSED sockets to free up memory and local port.

Finally, it handles window update, slow start, Karn Count, and retransmission stuff.

It's rather messy....

jwt27 commented 7 months ago

I see, there's a lot going on there. But normally that is on a timer (tcp_RETRAN_TIME), called from tcp_tick(). Why does the BSD interface need to bypass that timer?

I thought, if it is only to speed up ARP resolves, it could be more efficient to add a callback from pcarp.c back to pctcp.c.

gvanem / Watt-32

Optimization, general questions #86