Closed jwt27 closed 9 months ago
Maybe the rdtsc code should be disabled for now until a better algorithm is found.
Perhaps.
BTW. How did you generate these nice graphs? Perhaps you generated some CVS-file and used Pyplot to convert it to PNG?
The code above makes a valid CSV file (one column), the chart is then simply made with LibreOffice.
X-axis may not be 100% linear, but it's close enough.
OK, I was mistaken here. Neither the rdtsc nor 8254 timers are enabled by default. I assumed they were since that is the order they are checked in gettimeofday2()
, but hadn't looked that much into it.
So, what the first chart really shows is the output from djgpp's gettimeofday()
. Second chart I believe is just the BIOS clock (18.2 Hz from 8254). But I'm not sure why the first appears to have higher resolution.
Or, maybe the first chart is 8254? I see init_timers()
calls hires_timer(TRUE)
. This big maze of #ifdef
s is hard to follow.
Yes, first graph shows the 8254 timer (microsec_clock()
in gettod.c
). So the jitter bug is in there somewhere.
This big maze of #ifdefs is hard to follow.
It is indeed. Ok, so maybe a radical new timers-rewrite. I'm a big fan of Mongoose which I've used elsewhere. Ideas for new API:
timer_set (DWORD ms, int *flag)
: add to a global w32_timers
linked-list and returns timer_id
.timer_chk (int timer_id)
return TRUE/FALSE.timer_free (int timer_id)
remove from w32_timers
list.timer_poll (NULL)
that checks all w32_timers
(called from tcp_tick()
); a rewrite of mg_timer_poll()
that sets a sock->X_expired
flag or calls a timeout-function?This needs a fast mg_millis()
function; GetTickCount64()
on Windows should be good enough. On MSDOS, we should
avoid calling the BIOS-function. Perhaps simply clockbits()
?
I like the idea of timer_poll()
with callbacks, that is perfect for retransmission or daemons. Make it a sorted list or priority-queue, that way it only has to check the first item. But the current method with set_timeout()
/chk_timeout()
also has its uses. Need to carefully consider how timers are most often used now in the code.
For the clock source, I had this idea last night. Define a single function pointer like so:
typedef /*something*/ sock_timestamp;
typedef sock_timestamp (*sock_timefunc)();
extern sock_timefunc sock_time;
The first time it's called, it initializes itself. The #ifdef
maze is contained in one place only:
sock_timefunc sock_time = clock_init;
static sock_timestamp clock_init()
{
init_misc();
TRACE_FILE ("Selecting clock source: ");
#ifdef WIN32
TRACE_FILE ("GetTickCount64\n");
sock_time = clock_GetTickCount64;
#else
if (has_rdtsc && use_rdtsc)
{
TRACE_FILE ("rdtsc\n");
sock_time = clock_rdtsc;
}
else if (use_8254)
{
TRACE_FILE ("8254\n");
sock_time = clock_8254;
}
else
{
TRACE_FILE ("BIOS\n");
sock_time = clock_bios;
}
#endif
return (*sock_time)();
}
And users can provide their own implementation if they need to reprogram IRQ 0:
sock_timestamp base;
static sock_timestamp my_clock()
{
return base + /* ... */;
}
void setup()
{
base = (*sock_time)();
sock_time = my_clock;
}
For backwards compatibility, userTimerTick()
would be implemented the same way as above.
The #ifdef maze is contained in one place only:
Yes, that approach looks cleaner.
BTW, for sock_timestamp
in the example above, I thought NTP format (32:32 fixed-point) might be a good option. The main advantage is that it's easy to convert into both milliseconds and timeval
, without the slow 64-bit division/remainder:
typedef uint64 sock_timestamp;
struct timeval ntp_to_timeval (sock_timestamp t)
{
struct timeval tv;
tv.tv_sec = t >> 32;
tv.tv_usec = ((t & 0xffffffff) * 1000000UL) >> 32;
return (tv);
}
DWORD ntp_to_ms (sock_timestamp t)
{
return ((t * 1000) >> 32);
}
But you do need 64-bit ints for add/multiply, otherwise it gets very inconvenient. Which compilers don't support those? Isn't it just long long
everywhere?
Isn't it just long long everywhere?
Not really. All that does not have a #define HAVE_UINT64
are AFAICS:
wcc386
do have __int64
.
__BORLANDC__
on MSDOS. I.e. not clang based (but untested since 2001?)._MSC_VER <900
which is history. I.e. Quick-C for MSDOS.__TURBOC__
, also a museum piece. Time to drop it.Hm, for OpenWatcom, the manual does say long long
is available for both 16 and 32-bit (or at least, it doesn't say it's not available):
https://open-watcom.github.io/open-watcom-v2-wikidocs/clr.html#Integer_Types
Okay. I may be mistaken on Watcom C/16.
I have installed DOSBox-X and the output of the 16-bit Watcom tcpinfo.exe
shows:
Tried it:
$ cat test-ll.c
typedef unsigned long long uint64;
uint64 add (uint64 a, uint64 b)
{
return a + b;
}
uint64 mul (uint64 a, uint64 b)
{
return a * b;
}
uint64 shr (uint64 a, unsigned n)
{
return a >> n;
}
$ wcc -0 -oxt test-ll.c
Open Watcom C x86 16-bit Optimizing Compiler
Version 2.0 beta Dec 19 2023 01:51:32 (64-bit)
Copyright (c) 2002-2023 The Open Watcom Contributors. All Rights Reserved.
Portions Copyright (c) 1984-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.
See https://github.com/open-watcom/open-watcom-v2#readme for details.
/home/jw/test-ll.c: 17 lines, included 53, 0 warnings, 0 errors
Code size: 51
$ wdis test-ll.o
Module: /home/jw/test-ll.c
GROUP: 'DGROUP' CONST,CONST2,_DATA
Segment: _TEXT PARA USE16 00000033 bytes
0000 add_:
0000 55 push bp
0001 89 E5 mov bp,sp
0003 03 56 04 add dx,0x4[bp]
0006 13 4E 06 adc cx,0x6[bp]
0009 13 5E 08 adc bx,0x8[bp]
000C 13 46 0A adc ax,0xa[bp]
000F 5D pop bp
0010 C2 08 00 ret 0x0008
0013 FC cld
Routine Size: 20 bytes, Routine Base: _TEXT + 0000
0014 mul_:
0014 56 push si
0015 55 push bp
0016 89 E5 mov bp,sp
0018 8D 76 06 lea si,0x6[bp]
001B E8 00 00 call __U8M
001E 5D pop bp
001F 5E pop si
0020 C2 08 00 ret 0x0008
0023 FC cld
Routine Size: 16 bytes, Routine Base: _TEXT + 0014
0024 shr_:
0024 56 push si
0025 55 push bp
0026 89 E5 mov bp,sp
0028 8B 76 06 mov si,0x6[bp]
002B E8 00 00 call __U8RS
002E 5D pop bp
002F 5E pop si
0030 C2 02 00 ret 0x0002
Routine Size: 15 bytes, Routine Base: _TEXT + 0024
Looks like it works. Multiply and shift use library functions (like gcc does with 64-bit division). So, probably slow. I think all that's needed is an "expanding" multiply (32×32 -> 64). Easy to do in asm, not so much in C.
Trying to think of other methods. Maybe use a different type on 16-bit (eg. 20:12 fixed-point) - that gets messy. More or less defeats the point of "cleaning up" the code.
I'm comitting this and we'll what AppVeyor says:
--- a/src/wattcp.h
+++ b/src/wattcp.h
@@ -87,6 +87,11 @@
#define HAVE_UINT64
#endif
+/**< Ass-u-me for now that all compilers have 64-bit ints
+ */
+#undef HAVE_UINT64
+#define HAVE_UINT64
+
struct ulong_long {
DWORD lo;
DWORD hi;```
That's one way to find out. Could do it on a different branch though :)
Oh, I think it might still fail because uint64
/int64
are not yet defined.
diff --git a/src/wattcp.h b/src/wattcp.h
index 0caaca8..d2eb141 100644
--- a/src/wattcp.h
+++ b/src/wattcp.h
@@ -85,13 +85,15 @@
typedef unsigned __int64 uint64;
typedef __int64 int64;
#define HAVE_UINT64
+#else
+ /**< Ass-u-me for now that all compilers have 64-bit ints
+ */
+ typedef unsigned long long uint64;
+ typedef long long int64;
+ #undef HAVE_UINT64
+ #define HAVE_UINT64
#endif
-/**< Ass-u-me for now that all compilers have 64-bit ints
- */
-#undef HAVE_UINT64
-#define HAVE_UINT64
-
struct ulong_long {
DWORD lo;
DWORD hi;
OK. I'll revert that change. Since I cannot test Borland 16-bit, I'm inclined to drop MSDOS/bcc support. Who uses it nowadays?
So that would leave us with int64
/ uint64
for all I think.
It might not be needed if we can come up with some other timestamp format. 64-bit stuff is probably very slow on 8086, the way Watcom implements it.
Another idea (very rough example):
diff --git a/src/pctcp.c b/src/pctcp.c
index 7fe2e92..90c6d33 100644
--- a/src/pctcp.c
+++ b/src/pctcp.c
@@ -1434,6 +1434,8 @@ WORD W32_CALL tcp_tick (sock_type *s)
if (!packet) /* packet points to network layer protocol */
break;
+ time_tick ();
+
switch (eth_type)
{
case IP4_TYPE:
diff --git a/src/timer.c b/src/timer.c
index 91e4615..f5f9eda 100644
--- a/src/timer.c
+++ b/src/timer.c
@@ -70,6 +70,13 @@ unsigned num_cpus = 1;
static DWORD date = 0;
static DWORD date_ms = 0;
+static DWORD time_now = 0;
+
+void time_tick (void)
+{
+ time_now = set_timeout (0);
+}
+
/*
* Things for the user installable timer ISR.
*/
@@ -493,7 +500,6 @@ DWORD millisec_clock (void)
#endif /* !W32_NO_8087 */
#endif /* __MSDOS__ */
-
/**
* Return time for when given timeout (msec) expires.
* Make sure it never returns 0 (it will confuse `chk_timeout()`).
@@ -554,7 +560,7 @@ BOOL W32_CALL chk_timeout (DWORD value)
#if defined(HAVE_UINT64) || defined(WIN32)
if (value == 0UL)
return (0);
- return (set_timeout(0) >= value);
+ return (time_now >= value);
#else
{
diff --git a/src/timer.h b/src/timer.h
index 689bab5..7dfacc1 100644
--- a/src/timer.h
+++ b/src/timer.h
@@ -7,6 +7,8 @@
#include <sys/wtime.h> /* struct timeval */
#endif
+extern void time_tick (void);
+
#define user_tick_active W32_NAMESPACE (user_tick_active)
#define user_tick_base W32_NAMESPACE (user_tick_base)
#define user_tick_msec W32_NAMESPACE (user_tick_msec)
Update the timer only once per loop in tcp_tick()
and then use that in chk_timeout()
.
Timeouts are relatively coarse (1ms) and one loop should take less than 1ms, so there's no need to update the time more often than that.
Sorry I missed this message. So your idea is that this uses less CPU?
A (set_timeout(0) >= value);
is expensive and (time_now >= value);
is not?
Makes sense IMHO.
Yes, that is the idea. Should be a bit faster, depending on which timer is in use.
Your idea with a timer_poll()
function would achieve more or less the same thing, but that involves a more extensive rewrite. So this could be an intermediate step.
The
rdtsc timer code ([edit: 8254 timer (gettod.c
,tsc_microsec()
/adjust_cpu_clock()
)microsec_clock()
ingettod.c
)] is very inaccurate. It regularly jumps back and forth by a whole second, presumably in an attempt to keep up with libctime()
. This large jitter is what "accidentally" caused the daemon timer to be triggered.Given a simple test program:
We get the following data:
Compare to
8254 timer ([BIOS timer (timer.c
,millisec_clock()
/clockbits()
)set_timeout()
intimer.c
)] that is used in 16-bit builds:This has lower resolution, but at least is more monotonic. Maybe the rdtsc code should be disabled for now until a better algorithm is found.