tkchia / gcc-ia16

Fork of Lambertsen & Jenner (& al.)'s IA-16 (Intel 16-bit x86) port of GNU compilers ― added far pointers & more • use https://github.com/tkchia/build-ia16 to build • Ubuntu binaries at https://launchpad.net/%7Etkchia/+archive/ubuntu/build-ia16/ • DJGPP/MS-DOS binaries at https://gitlab.com/tkchia/build-ia16/-/releases • mirror of https://gitlab.com/tkchia/gcc-ia16
GNU General Public License v2.0
173 stars 13 forks source link

Problems with automatic extension of near to far pointer in ia16-elf-gcc #97

Open ghaerr opened 2 years ago

ghaerr commented 2 years ago

Over at ELKS, I'm considering an enhancement that requires the kernel buffers to be moved out of kernel data space, which then requires a __far pointer to access them. While initially playing around with what the new generated code might look like and evaluate its potential effects, I've discovered a problem which could result in not being able to more easily use far pointers in ia16-elf-gcc.

In a nutshell, the problem is that when an object's (array, etc) near pointer value is NULL (0), it's far pointer never is. More specifically, when a previous near pointer assignment to an address whose contents is 0 is changed to be a far pointer to the same address, the code no longer functions properly, and no compiler warning is given.

Case in point, the following code in elks/fs/buffer.c, a single line was changed to evaluate the effects of using a __far pointer:

void map_buffer(register struct buffer_head *bh)
{
    struct buffer_head __far *bmap; // <--- ADDED __far HERE
    int i;

    /* If buffer is already mapped, just increase the refcount and return */
    if (bh->b_data /*|| bh->b_seg != kernel_ds*/) {
        if (!bh->b_mapcount)
            debug("REMAP: %d\n", bh->b_num);
        goto end_map_buffer;
    }

    i = lastL1map;
    /* search for free L1 buffer or wait until one is available*/
    for (;;) {
        if (++i >= NR_MAPBUFS) i = 0;
        debug("map:   %d try %d\n", bh->b_num, i);

        /* First check for the trivial case, to avoid dereferencing a null pointer */
        if (!(bmap = L1map[i]))      // <--- expression is always non-zero, even though L1map[i] in data segment is 0.
            break;

I fear that without a warning, or a method to test a far pointer's possible NULL value differently, adding __far pointers to the kernel will require painstaking inspection for every source line affected by hand. My naiveté assumes that the code should continue to function properly unless a warning is given, with just bmap being converted from a near to far pointer and no other changes.

Are there any options that might allow the compiler to operate differently? I realize this goes deep into what the meaning of a "NULL" far pointer is, and haven't yet evaluated the following possible issues:

char __far *p;
char *q;
char *f();

if (p != 0) x();  // assumes checks two words in p for 0, possibly desire an option to only check the low word for 0 for compatibility
p = x();          // assume is allowed, hi word of p == SS; but p != 0 even when x returns NULL

Thank you!

tkchia commented 2 years ago

Hello @ghaerr,

Let me see if I can add some of warning — possibly enabled by -Wnonnull — for such cases. It will probably be useful for the compiler to check if a near pointer q might potentially be null, if one attempts to cast it to a far pointer, even if implicitly.

(I checked that both Open Watcom and Turbo C++ also simply tack the data segment value into a near pointer q to make it a far pointer. I guess the unspoken assumption for both compilers is that q needs to be non-null.)

Thank you!

ghaerr commented 2 years ago

Hello @tkchia,

Let me see if I can add some of warning — possibly enabled by -Wnonnull

Thank you, that would help immensely in a large porting effort.

simply tack the data segment value into a near pointer q to make it a far pointer.

Looking at the code generated below for the if (!(bmap = L1map[i])) code above, our compiler generates something similar, in that is effectively tries to check whether %ss is 0, which will never happen:

        movw    %dx,    %bx
        shlw    $1,     %bx
        movw    $L1map, %si
        movw    %ss:(%bx,%si),  %si
        movw    %ss,    %cx // <--- will never be 0
        movw    %si,    %ax
        orw     %cx,    %ax
        movw    $L1map, -2(%bp)

One has to wonder whether an option that would only check the low word of the pointer would be of better use, and just not generate this code at all otherwise.

Obviously, in a full "large model" program, testing all 32 bits of a pointer is required. But in "mixed model" (i.e. near data segment but occasional far pointers to that or other data), a different code generation option would possibly help a lot. The only failure case in this proposed new option would be setting a far pointer manually (building the pointer by hand instead of casting from a data segment address) in which it is desired that the low word be 0. However, this would still work as a pointer, and also be == NULL.

Thank you!

tkchia commented 2 years ago

Hello @ghaerr,

Let me see if I can add some of warning — possibly enabled by -Wnonnull

Thank you, that would help immensely in a large porting effort.

Hmm... we have a problem. I do not quite know how to have the IA-16 back-end obtain enough information from the middle-end, about whether a particular expression might be null/zero. I will need to think about this a bit more.

But in "mixed model" (i.e. near data segment but occasional far pointers to that or other data), a different code generation option would possibly help a lot.

This option actually sounds rather dangerous to me. Far pointers with a 0 offset component can easily arise in many ways, depending on how you obtain the occasional far buffer addresses.

Thank you!

ghaerr commented 2 years ago

Hello @tkchia,

I do not quite know how to have the IA-16 back-end obtain enough information from the middle-end, about whether a particular expression might be null/zero. I will need to think about this a bit more.

Well, thank you for considering all of this.

After looking at the code generated from the above example, as well as your comments and thinking more about the problem of mixing near and far pointers in general, I came up with a modified design that does not create far pointers arbitrarily within the kernel. That is, any far pointer access will always be through a #define or function wrapper and the more dirty work contained within a single source file. I am presently considering this approach, which I like better than just adding far pointers arbitrarily to the kernel. I'm actually of mixed opinion as to whether the kernel compilation should require a compiler with far pointer support, especially since especially lately, great care has been taken to track segment and offset information separately. Should the day ever arrive where we wanted to support a "protected-segment mode" (a.k.a 80286 selectors), this would be very convenient.

Far pointers with a 0 offset component can easily arise in many ways, depending on how you obtain the occasional far buffer addresses.

Agreed. I am now thinking of a design where any far pointer used is created specifically by a single function or two, and used explicitly through #defined member access function, to avoid all of the issues we've been discussing in this thread. This method also would allow the system to be configured to run as it does now with near pointers, without other source code changes, as any magic would be strictly contained within a few functions or defines. Another benefit would be restricting the (somewhat ugly) code generated when using far pointers to just a few functions; this could then be optimized or rewritten without far pointers if need be, instead of being generated throughout the kernel.

Thank you!

ghaerr commented 2 years ago

Hello @tkchia,

I managed to get the ELKS kernel 2500 system buffers enhancement running, using far pointers, but containing them within a single access function. Overall, the code generated looks quite good, there is no chance of near <-> far pointer mangling or misinterpretation, and system stability is preserved.

The system uses far pointers to allow up to 2500 kernel buffer heads (26 bytes each) to be placed outside the kernel data segment, which then are used to manage the actual buffers, which are in XMS memory. This allows up to 2.5MiB of system buffers, almost two floppies worth!

I thought you might appreciate how well the compiler worked in this design, which limited the far pointer access to a single function, which was then used to access one or two bytes outside the kernel data segment per function call. When called from far or near procedures, the generated code was small, and the compiler seems to do well with DS cached in other registers, as well as handling the DX:AX return value from the function returning a far pointer. The overall cost to add this functionality was around 1K bytes additional code (and up to 65K buffer heads in main memory).

Thank you!

tkchia commented 2 years ago

Hello @ghaerr,

This is cool!

Still, I think it would still be useful if gcc-ia16 can warn about dubious casts of possibly NULL pointers, so I guess I will keep this issue open for now. (I have been trying to implement such a warning, possibly as a new -W... option, but still without much success.)

Thank you!