gvanem / Watt-32

Watt-32 TCP/IP library and samples.
https://www.watt-32.net/
18 stars 8 forks source link

Timer inaccuracy #99

Closed jwt27 closed 7 months ago

jwt27 commented 8 months ago

The rdtsc timer code (gettod.c, tsc_microsec()/adjust_cpu_clock()) [edit: 8254 timer (microsec_clock() in gettod.c)] is very inaccurate. It regularly jumps back and forth by a whole second, presumably in an attempt to keep up with libc time(). This large jitter is what "accidentally" caused the daemon timer to be triggered.

Given a simple test program:

#include <stdio.h>
#include <tcp.h>

int main()
{
  const DWORD end = set_timeout(10000);
  DWORD now;

  do
  {
    now = set_timeout(0);
    printf("%lu\n", now);
  }
  while (now < end);

  return 0;
}

We get the following data: image

Compare to 8254 timer (timer.c, millisec_clock()/clockbits()) [BIOS timer (set_timeout() in timer.c)] that is used in 16-bit builds: image

This has lower resolution, but at least is more monotonic. Maybe the rdtsc code should be disabled for now until a better algorithm is found.

gvanem commented 8 months ago

Maybe the rdtsc code should be disabled for now until a better algorithm is found.

Perhaps.

BTW. How did you generate these nice graphs? Perhaps you generated some CVS-file and used Pyplot to convert it to PNG?

jwt27 commented 8 months ago

The code above makes a valid CSV file (one column), the chart is then simply made with LibreOffice.

X-axis may not be 100% linear, but it's close enough.

jwt27 commented 7 months ago

OK, I was mistaken here. Neither the rdtsc nor 8254 timers are enabled by default. I assumed they were since that is the order they are checked in gettimeofday2(), but hadn't looked that much into it.

So, what the first chart really shows is the output from djgpp's gettimeofday(). Second chart I believe is just the BIOS clock (18.2 Hz from 8254). But I'm not sure why the first appears to have higher resolution.

jwt27 commented 7 months ago

Or, maybe the first chart is 8254? I see init_timers() calls hires_timer(TRUE). This big maze of #ifdefs is hard to follow.

jwt27 commented 7 months ago

Yes, first graph shows the 8254 timer (microsec_clock() in gettod.c). So the jitter bug is in there somewhere.

gvanem commented 7 months ago

This big maze of #ifdefs is hard to follow.

It is indeed. Ok, so maybe a radical new timers-rewrite. I'm a big fan of Mongoose which I've used elsewhere. Ideas for new API:

This needs a fast mg_millis() function; GetTickCount64() on Windows should be good enough. On MSDOS, we should avoid calling the BIOS-function. Perhaps simply clockbits()?

jwt27 commented 7 months ago

I like the idea of timer_poll() with callbacks, that is perfect for retransmission or daemons. Make it a sorted list or priority-queue, that way it only has to check the first item. But the current method with set_timeout()/chk_timeout() also has its uses. Need to carefully consider how timers are most often used now in the code.

For the clock source, I had this idea last night. Define a single function pointer like so:

typedef /*something*/ sock_timestamp;
typedef sock_timestamp (*sock_timefunc)();

extern sock_timefunc sock_time;

The first time it's called, it initializes itself. The #ifdef maze is contained in one place only:

sock_timefunc sock_time = clock_init;

static sock_timestamp clock_init()
{
  init_misc();
  TRACE_FILE ("Selecting clock source: ");

#ifdef WIN32
  TRACE_FILE ("GetTickCount64\n");
  sock_time = clock_GetTickCount64;
#else
  if (has_rdtsc && use_rdtsc)
  {
    TRACE_FILE ("rdtsc\n");
    sock_time = clock_rdtsc;
  }
  else if (use_8254)
  {
    TRACE_FILE ("8254\n");
    sock_time = clock_8254;
  }
  else
  {
    TRACE_FILE ("BIOS\n");
    sock_time = clock_bios;
  }
#endif

  return (*sock_time)();
}

And users can provide their own implementation if they need to reprogram IRQ 0:

sock_timestamp base;

static sock_timestamp my_clock()
{
  return base + /* ... */;
}

void setup()
{
  base = (*sock_time)();
  sock_time = my_clock;
}

For backwards compatibility, userTimerTick() would be implemented the same way as above.

gvanem commented 7 months ago

The #ifdef maze is contained in one place only:

Yes, that approach looks cleaner.

jwt27 commented 7 months ago

BTW, for sock_timestamp in the example above, I thought NTP format (32:32 fixed-point) might be a good option. The main advantage is that it's easy to convert into both milliseconds and timeval, without the slow 64-bit division/remainder:

typedef uint64 sock_timestamp;

struct timeval ntp_to_timeval (sock_timestamp t)
{
  struct timeval tv;

  tv.tv_sec = t >> 32;
  tv.tv_usec = ((t & 0xffffffff) * 1000000UL) >> 32;

  return (tv);
}

DWORD ntp_to_ms (sock_timestamp t)
{
  return ((t * 1000) >> 32);
}

But you do need 64-bit ints for add/multiply, otherwise it gets very inconvenient. Which compilers don't support those? Isn't it just long long everywhere?

gvanem commented 7 months ago

Isn't it just long long everywhere?

Not really. All that does not have a #define HAVE_UINT64 are AFAICS:

jwt27 commented 7 months ago

Hm, for OpenWatcom, the manual does say long long is available for both 16 and 32-bit (or at least, it doesn't say it's not available):

https://open-watcom.github.io/open-watcom-v2-wikidocs/clr.html#Integer_Types

gvanem commented 7 months ago

Okay. I may be mistaken on Watcom C/16. I have installed DOSBox-X and the output of the 16-bit Watcom tcpinfo.exe shows: watt-32-watcom-tcpinfo

jwt27 commented 7 months ago

Tried it:

$ cat test-ll.c
typedef unsigned long long uint64;

uint64 add (uint64 a, uint64 b)
{
  return a + b;
}

uint64 mul (uint64 a, uint64 b)
{
  return a * b;
}

uint64 shr (uint64 a, unsigned n)
{
  return a >> n;
}

$ wcc -0 -oxt test-ll.c
Open Watcom C x86 16-bit Optimizing Compiler
Version 2.0 beta Dec 19 2023 01:51:32 (64-bit)
Copyright (c) 2002-2023 The Open Watcom Contributors. All Rights Reserved.
Portions Copyright (c) 1984-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.
See https://github.com/open-watcom/open-watcom-v2#readme for details.
/home/jw/test-ll.c: 17 lines, included 53, 0 warnings, 0 errors
Code size: 51

$ wdis test-ll.o
Module: /home/jw/test-ll.c
GROUP: 'DGROUP' CONST,CONST2,_DATA

Segment: _TEXT PARA USE16 00000033 bytes
0000                            add_:
0000  55                                push            bp
0001  89 E5                             mov             bp,sp
0003  03 56 04                          add             dx,0x4[bp]
0006  13 4E 06                          adc             cx,0x6[bp]
0009  13 5E 08                          adc             bx,0x8[bp]
000C  13 46 0A                          adc             ax,0xa[bp]
000F  5D                                pop             bp
0010  C2 08 00                          ret             0x0008
0013  FC                                cld

Routine Size: 20 bytes,    Routine Base: _TEXT + 0000

0014                            mul_:
0014  56                                push            si
0015  55                                push            bp
0016  89 E5                             mov             bp,sp
0018  8D 76 06                          lea             si,0x6[bp]
001B  E8 00 00                          call            __U8M
001E  5D                                pop             bp
001F  5E                                pop             si
0020  C2 08 00                          ret             0x0008
0023  FC                                cld

Routine Size: 16 bytes,    Routine Base: _TEXT + 0014

0024                            shr_:
0024  56                                push            si
0025  55                                push            bp
0026  89 E5                             mov             bp,sp
0028  8B 76 06                          mov             si,0x6[bp]
002B  E8 00 00                          call            __U8RS
002E  5D                                pop             bp
002F  5E                                pop             si
0030  C2 02 00                          ret             0x0002

Routine Size: 15 bytes,    Routine Base: _TEXT + 0024

Looks like it works. Multiply and shift use library functions (like gcc does with 64-bit division). So, probably slow. I think all that's needed is an "expanding" multiply (32×32 -> 64). Easy to do in asm, not so much in C.

Trying to think of other methods. Maybe use a different type on 16-bit (eg. 20:12 fixed-point) - that gets messy. More or less defeats the point of "cleaning up" the code.

gvanem commented 7 months ago

I'm comitting this and we'll what AppVeyor says:


--- a/src/wattcp.h
+++ b/src/wattcp.h
@@ -87,6 +87,11 @@
   #define HAVE_UINT64
 #endif

+/**< Ass-u-me for now that all compilers have 64-bit ints
+ */
+#undef  HAVE_UINT64
+#define HAVE_UINT64
+
 struct ulong_long {
        DWORD lo;
        DWORD hi;```
gvanem commented 7 months ago

In the pipe-line now.

jwt27 commented 7 months ago

That's one way to find out. Could do it on a different branch though :)

jwt27 commented 7 months ago

Oh, I think it might still fail because uint64/int64 are not yet defined.

diff --git a/src/wattcp.h b/src/wattcp.h
index 0caaca8..d2eb141 100644
--- a/src/wattcp.h
+++ b/src/wattcp.h
@@ -85,13 +85,15 @@
   typedef unsigned __int64 uint64;
   typedef __int64          int64;
   #define HAVE_UINT64
+#else
+  /**< Ass-u-me for now that all compilers have 64-bit ints
+   */
+  typedef unsigned long long  uint64;
+  typedef long long           int64;
+  #undef  HAVE_UINT64
+  #define HAVE_UINT64
 #endif

-/**< Ass-u-me for now that all compilers have 64-bit ints
- */
-#undef  HAVE_UINT64
-#define HAVE_UINT64
-
 struct ulong_long {
        DWORD lo;
        DWORD hi;
gvanem commented 7 months ago

OK. I'll revert that change. Since I cannot test Borland 16-bit, I'm inclined to drop MSDOS/bcc support. Who uses it nowadays?

So that would leave us with int64 / uint64 for all I think.

jwt27 commented 7 months ago

It might not be needed if we can come up with some other timestamp format. 64-bit stuff is probably very slow on 8086, the way Watcom implements it.

jwt27 commented 7 months ago

Another idea (very rough example):

diff --git a/src/pctcp.c b/src/pctcp.c
index 7fe2e92..90c6d33 100644
--- a/src/pctcp.c
+++ b/src/pctcp.c
@@ -1434,6 +1434,8 @@ WORD W32_CALL tcp_tick (sock_type *s)
     if (!packet)  /* packet points to network layer protocol */
        break;

+    time_tick ();
+
     switch (eth_type)
     {
       case IP4_TYPE:
diff --git a/src/timer.c b/src/timer.c
index 91e4615..f5f9eda 100644
--- a/src/timer.c
+++ b/src/timer.c
@@ -70,6 +70,13 @@ unsigned num_cpus         = 1;
 static DWORD date    = 0;
 static DWORD date_ms = 0;

+static DWORD time_now = 0;
+
+void time_tick (void)
+{
+  time_now = set_timeout (0);
+}
+
 /*
  * Things for the user installable timer ISR.
  */
@@ -493,7 +500,6 @@ DWORD millisec_clock (void)
 #endif  /* !W32_NO_8087 */
 #endif  /* __MSDOS__ */

-
 /**
  * Return time for when given timeout (msec) expires.
  * Make sure it never returns 0 (it will confuse `chk_timeout()`).
@@ -554,7 +560,7 @@ BOOL W32_CALL chk_timeout (DWORD value)
 #if defined(HAVE_UINT64) || defined(WIN32)
   if (value == 0UL)
      return (0);
-  return (set_timeout(0) >= value);
+  return (time_now >= value);

 #else
   {
diff --git a/src/timer.h b/src/timer.h
index 689bab5..7dfacc1 100644
--- a/src/timer.h
+++ b/src/timer.h
@@ -7,6 +7,8 @@
 #include <sys/wtime.h>      /* struct timeval */
 #endif

+extern void time_tick (void);
+
 #define user_tick_active  W32_NAMESPACE (user_tick_active)
 #define user_tick_base    W32_NAMESPACE (user_tick_base)
 #define user_tick_msec    W32_NAMESPACE (user_tick_msec)

Update the timer only once per loop in tcp_tick() and then use that in chk_timeout(). Timeouts are relatively coarse (1ms) and one loop should take less than 1ms, so there's no need to update the time more often than that.

gvanem commented 7 months ago

Sorry I missed this message. So your idea is that this uses less CPU?

A (set_timeout(0) >= value); is expensive and (time_now >= value);is not? Makes sense IMHO.

jwt27 commented 7 months ago

Yes, that is the idea. Should be a bit faster, depending on which timer is in use. Your idea with a timer_poll() function would achieve more or less the same thing, but that involves a more extensive rewrite. So this could be an intermediate step.