jart / cosmopolitan

build-once run-anywhere c library
ISC License
17.7k stars 603 forks source link

Bare metal enhancements discussion #638

Open ghaerr opened 1 year ago

ghaerr commented 1 year ago

I'm opening this issue as a place to discuss possible future bare metal enhancements in one place, rather than throughout the various PRs recently opened and merged, to make it easier to track.

Thanks to @tkchia, there's been very nice progress in getting bare metal .com programs running on real hardware, as well as in QEMU, and a better understanding of exactly how the startup and long mode physical and virtual memory management works. I plan on testing the existing PrintMemoryIntervals function on bare metal, as well as converting its test function into a utility allowing for display of BIOS 0xe820 mapped address space across user's differing systems.

With @tkchia's new VESA VBE startup mode detection WIP, a number of pretty cool scenarios open up for both text and graphics VGA console operation using other than 8x16 glyphs on an 80x25 terminal. That brings up the issue of what font bits might be used for display.

In the short term, it will be easy to include a 256-glyph CP437 bitmapped font for initial graphics console drawing, and for any extended VESA text modes, I presume they would also include their own internal (cp437?) fonts for such operation (is that true)?

In the longer term, I can imagine the desire to display unicode glyphs for full graphics console compatibility with the rest of Cosmopolitan platforms. I have some ideas about how to use FreeType, Truetype-STB or AGG (AntiGrain Geometry) to draw truetype glyphs from a compiled-in memory buffer. But why not read them from disk? Since there's no filesystem on bare metal, it could be very cool to use the Cosmopolitan zipos filesystem embedded in APE/.com files to contain the desired unicode, truetype (or bitmap) font for display by that program. A user could zip their desired font into the .com file without recompilation, for display on their monitor.

Does the APE boot loader require the .com file to be in contiguous sectors? If so, it seems we might be able to redirect a special file descriptor, used within the bare metal program to refer to itself, to read and mmap the font file for use by the graphics console. Another idea would be to embed the font file in a special ELF section and have it automatically loaded and mapped at boot, through the __map_phdrs function. It seems we might need some additional capability of reading the .com file from disk using either of these techniques.

Regarding the graphics drawing functions required by a graphics VGA console, it appears that we can make everything work using just fillrect (for erasing backgrounds prior to drawing the foreground bits of a glyph, as well as all clear screens and clear to end of lines, etc), memmove (for scrolling), and textout (for drawing the foreground bits of a bitmapped glyph, and a more complex version that uses a framebuffer-specific blit to blend a glyph's foreground bits into the background for truetype-based glyphs).

The graphics functions should probably use an RGB()-style macro for color specification, getting those values from a specified "master palette", which can be easily changed for varying color schemes. The functions can then internally map the RGB value to the particular in-use framebuffer pixel format.

When the framebuffer stride == width, it will be possible to use memmove to scroll the entire display area in one call. Otherwise, memmove will be required to be called height times to conserve the unused stride bits.

Since anyone's VESA BIOS may return vastly different framebuffer sizes and pixel formats, we may consider including varying sizes of bitmapped fonts, even in the short term. The various pixel formats can be fairly easily handled internally within each of the above graphics functions, and a selection of matching bitmap fonts could be stored in the .com zipos filesystem, prior to the implementation of truetype display. There are a number of bitmap font file formats and tools for managing them that could fairly easily be contributed.

While this all might sound complicated and somewhat overkill, I thought to put it all out there for discussion.

Thank you!

tkchia commented 1 year ago

Hello @ghaerr,

Well, I am currently hoping to get https://github.com/jart/cosmopolitan/pull/636 or https://github.com/jart/cosmopolitan/pull/640 pulled into the project mainline (even if in a modified form). These patches should hopefully make any further exploration a bit easier. Basically:

for any extended VESA text modes, I presume they would also include their own internal (cp437?) fonts for such operation (is that true)?

I would expect so. Apparently though, my test PC does not report having any extended text modes, so I am not sure.

Thank you!

tkchia commented 1 year ago

Hello @ghaerr,

In the longer term, I can imagine the desire to display unicode glyphs for full graphics console compatibility with the rest of Cosmopolitan platforms.

That will be cool of course — though I would be a bit wary of adding too much high-level functionality to the VGA output code. This will need to be done carefully.

The libc/vga/ layer is currently considered to be a low-level layer within Cosmopolitan. In particular, the file descriptor machinery currently depends on it — which means the VGA code should not turn around and itself depend on file descriptor functionality to work. (And I think it is good to keep the VGA stuff low-level.)

Thank you!

ghaerr commented 1 year ago

Hello @tkchia,

Pursuant to our discussion in https://github.com/jart/cosmopolitan/pull/637#issuecomment-1258555039, I've attached a single-file console emulator fbe.c built on top of the SDL library that emulates the IBM PC adapter RAM text display, as well drawing the ROM characters and cursor from a compiled-in CP437 8x16 font configurable to use any of the the VESA VBE framebuffer pixel formats. This should ease of development and testing of the graphical VGA console across all the supported framebuffer formats. (The font data is in a separate file rom8x16.c).

The included Makefile should allow compilation on Linux, though I've only tested on macOS. The SDL2 library and headers need to be installed. The emulator currently just echoes characters typed in and scrolls on LF, drawing them in an allocated framebuffer of the configured format, then passes that to SDL for display.

Note: this code is presented as working code, but not optimized as previously discussed. Because drawbitmap draws both the foreground and background bits, the entire graphical VGA console can be effectively implemented with just drawbitmap, without having to use fillrect. Later, a very fast "conversion-blit" routine will replace drawbitmap for very fast output.

gfx.zip

A quick technical summary:

Pixel formats supported for text drawing:

/* supported framebuffer pixel formats */
#define MWPF_TRUECOLORARGB 0    /* 32bpp, memory byte order B, G, R, A */
#define MWPF_TRUECOLORABGR 1    /* 32bpp, memory byte order R, G, B, A */
#define MWPF_TRUECOLORBGR  2    /* 24bpp, memory byte order R, G, B */
#define MWPF_TRUECOLOR565  3    /* 16bpp, le unsigned short 5/6/5 RGB */
#define MWPF_TRUECOLOR555  4    /* 16bpp, le unsigned short 5/5/5 RGB */

Note: I prefer the above naming scheme, as pixel conversions get complicated quickly, since the format differs between the CPU-register format versus the memory byte-order format. In the source, I've used an MW prefix that happens to match well with Microwindows, that will ease adding more routines when we need them.

In the above naming convention, "TRUECOLORABGR" is the pixel format in a 32-bit CPU register, that is, A in bits 24-31, and R in 0-7. (A=Alpha is always 255 for now, until src-over blending is required for truetype output). The memory byte order will be the reverse of the CPU format on little endian machines, of course. Thus, the pixel format name is always the CPU register format, which also happens to closely match the SDL naming convention.

The "console" structure allows for automatic configuration of text mode or graphical mode with the specification of a few fields, like cols and lines, with all else automatically calculated according the physical framebuffer pixel format.

Here's the console struct:

    /* configurable parameters */
    int cols;               /* # text columns */
    int lines;              /* # text rows */
    int pixtype;            /* console pixel format */
    int bpp;                /* console bits per pixel */
    int char_width;         /* glyph width in pixels */
    int char_height;        /* glyph height in pixels */
    MWIMAGEBITS *fontbits;  /* glyph bits (currently for 256 glyphs) */

    unsigned short *text_ram;/* emulated adaptor RAM (= cols * lines * 2) */
    int curx;               /* cursor x position */
    int cury;               /* cursor y position */
    unsigned char *screen;  /* console framebuffer */
    int width;              /* console width in pixels (= cols * char_width) */
    int height;             /* console height in pixels (= lines * char_height) */
    int pitch;              /* console stride in bytes, offset to next pixel row */
};

The intention is to show which portions of the console would be configurable from a VESA selection, versus the automatically calculated values.

Finally, the single drawbitmap routine should be able to be used in your VGA console code to draw characters, with a known pixel format. This routine uses a compiled-in bitmap font, but that also could be configurable should it be decided that different font sizes might want to be used on various VESA selections.

/* draw a character bitmap */
static void drawbitmap(struct console *con, int c, unsigned char attr,
    int x, int y, int drawbg)
{
    MWIMAGEBITS *imagebits = con->fontbits + con->char_height * c;
    int minx = x;
    int maxx = x + con->char_width - 1;
    int bitcount = 0;
    unsigned short bitvalue = 0;
    int height = con->char_height;
    unsigned short usval;

    /* convert EGA attribute to RGB */
    int fg = attr & 0x0F;
    int bg = (attr & 0x70) >> 4;
    unsigned char fg_red = ega_colormap[fg].r;
    unsigned char fg_green = ega_colormap[fg].g;
    unsigned char fg_blue = ega_colormap[fg].b;
    unsigned char bg_red = ega_colormap[bg].r;
    unsigned char bg_green = ega_colormap[bg].g;
    unsigned char bg_blue = ega_colormap[bg].b;

    while (height > 0) {
        unsigned char *pixels;
        if (bitcount <= 0) {
            bitcount = 16;
            bitvalue = *imagebits++;
            pixels = con->screen + y * con->pitch + x * (con->bpp >> 3);
        }
        switch (con->pixtype) {
        case MWPF_TRUECOLORARGB:    /* byte order B G R A */
            if (MWIMAGE_TESTBIT(bitvalue)) {
                *pixels++ = fg_blue;
                *pixels++ = fg_green;
                *pixels++ = fg_red;
                *pixels++ = 0xff;
            } else if (drawbg) {
                *pixels++ = bg_blue;
                *pixels++ = bg_green;
                *pixels++ = bg_red;
                *pixels++ = 0xff;
            }
            break;
        case MWPF_TRUECOLORABGR:    /* byte order R G B A */
            if (MWIMAGE_TESTBIT(bitvalue)) {
                *pixels++ = fg_red;
                *pixels++ = fg_green;
                *pixels++ = fg_blue;
                *pixels++ = 0xff;
            } else if (drawbg) {
                *pixels++ = bg_red;
                *pixels++ = bg_green;
                *pixels++ = bg_blue;
                *pixels++ = 0xff;
            }
            break;
        case MWPF_TRUECOLORBGR:     /* byte order R G B */
            if (MWIMAGE_TESTBIT(bitvalue)) {
                *pixels++ = fg_red;
                *pixels++ = fg_green;
                *pixels++ = fg_blue;
            } else if (drawbg) {
                *pixels++ = bg_red;
                *pixels++ = bg_green;
                *pixels++ = bg_blue;
            }
            break;
        case MWPF_TRUECOLOR565:
            if (MWIMAGE_TESTBIT(bitvalue)) {
                usval = RGB2PIXEL565(fg_red, fg_green, fg_blue);
                *pixels++ = usval & 255;
                *pixels++ = usval >> 8;
            } else if (drawbg) {
                usval = RGB2PIXEL565(bg_red, bg_green, bg_blue);
                *pixels++ = usval & 255;
                *pixels++ = usval >> 8;
            }
            break;
        case MWPF_TRUECOLOR555:
            if (MWIMAGE_TESTBIT(bitvalue)) {
                usval = RGB2PIXEL555(fg_red, fg_green, fg_blue);
                *pixels++ = usval & 255;
                *pixels++ = usval >> 8;
            } else if (drawbg) {
                usval = RGB2PIXEL555(bg_red, bg_green, bg_blue);
                *pixels++ = usval & 255;
                *pixels++ = usval >> 8;
            }
            break;
        }
        bitvalue = MWIMAGE_SHIFTBIT(bitvalue);
        bitcount--;
        if (x++ == maxx) {
            x = minx;
            ++y;
            --height;
            bitcount = 0;
        }
    }
}

The scrollup routine can be used to scroll the framebuffer when needed.

Please feel free to use any of this code to get the VGA graphical console running. I plan on posting the emulator and/or pieces there of, depending on what @jart wants to do.

Thank you!

ghaerr commented 1 year ago

Hello @tkchia,

I can imagine the desire to display unicode glyphs

I would be a bit wary of adding too much high-level functionality to the VGA output code. (And I think it is good to keep the VGA stuff low-level.)

After having written the text/graphics console emulator, I realized a bit more about how this could work, and I agree.

What I started to see is that any "console", including a fully graphical version, really needs to keep a separate "charcode/attribute" memory region, in addition to the framebuffer contents. That is, a "VGA console" on IBM PC hardware would keep a 2-byte "charcode/attribute" in the actual hardware text display buffer, but a "unicode console" could not, and actually should instead keep a separately-allocate charcode/attribute buffer, but of a different size, for instance: 2 or 4 bytes for the unicode charcode and 3-6 bytes for the RGB foreground and background color. Instead of trying to use both a PC hardware char/attr ram area and a unicode map, only a single map should be allocated, and used exclusively within the console code as the char/attr map. This also has the advantage of simplifying scrolling, clearing, etc, as only a single char/attr map would be consulted to produce the entire screen of glyphs.

Thus, rather than trying to hang all sorts of "extras" onto a VGA console, a better idea might be to implement a Unicode console by allocating the required "adaptor RAM" separately, and not use the PC hardware buffer at all. The char/attr buffer would be allocated depending on the number of unicode glyphs and color support desired. In a sense, it would be like a C++ template being used as a console declaration, so to speak.

In any case, we can worry about all that in the future. I agree that we can likely include a compiled-in font or two for bitmap display in a graphical VGA console, and leave Unicode and/or truetype display for a future Unicode console, coded separately.

Thank you!

paulwratt commented 1 year ago

if you look at how linux manages framebuffer and especially TTY, there are about 4 seperate device page layouts.

EDIT: to put the above in context, thats Linux Console on RPi with BGR565 (default) (and my default monitor layout is native/prefered 1360x768 32bit 60Hz HDMI mode)

tkchia commented 1 year ago

Hello @ghaerr,

Thanks for the code! I will consult your code for ideas on what to do. I though suspect that I (we?) will not be able to use it directly in this project. (Me, I will need to develop the frame buffer output logic very slowly and incrementally — at least, until it becomes easier to debug bare metal crashes. And of course, @jart might have other ideas on what to do.)

Thus, the pixel format name is always the CPU register format, which also happens to closely match the SDL naming convention.

Interesting. It seems the UEFI standard uses the exact opposite naming order (e.g. PixelRedGreenBlueReserved8BitPerColor).

            if (MWIMAGE_TESTBIT(bitvalue)) {
                *pixels++ = fg_red;
                *pixels++ = fg_green;
                *pixels++ = fg_blue;
                *pixels++ = 0xff;
            } else if (drawbg) {
                *pixels++ = bg_red;
                *pixels++ = bg_green;
                *pixels++ = bg_blue;
                *pixels++ = 0xff;
            }

I happen to be thinking of plotting an RGBA pixel by writing an entire longword at once — i.e. through a pointer to a uint32_t. In your experience, do you think this might be a good (or perhaps bad) idea?

Thank you!

ghaerr commented 1 year ago

Hello @tkchia,

I will need to develop the frame buffer output logic very slowly and incrementally — at least, until it becomes easier to debug bare metal crashes.

Looking at your other PRs recently posted, I can see you're fighting a number of low-level problems not directly related to drawing into a framebuffer. Also, agreed there's potentially much (re)organization with the VGA console data structures.

I though suspect that I (we?) will not be able to use it directly in this project.

The idea behind writing a framebuffer emulator was to ensure that drawbitmap, the core C function similar to what you'll need to draw a text bitmap on a framebuffer, actually works with each of the VESA VBE modes, and could be used as a quick way to get graphical character output using a compiled-in font.

Once a framebuffer address is established and accessible from long mode, of course, all one needs to do is write to it, and the displayed contents change. The drawbitmap function could be used outside of a VGA console, with slightly modified parameters, to merely display an "A" glyph from the compiled-in font table in the top left screen corner, and then test cycle through the various VESA modes, for instance. Using a known-working function will help when the screen output looks like garbage, but it won't help getting a long-mode framebuffer working in the first place.

I would be happy to jump in and help, but agree this will likely have to be built from the bottom up for a bit, which you're doing a great job with :)

It seems the UEFI standard uses the exact opposite naming order (e.g. PixelRedGreenBlueReserved8BitPerColor).

There are many naming conventions, for sure. I have found that when doing pixel arithmetic, it made more sense to think of the register content order than the memory order, especially since the memory order changes between big and little endian, causing even more confusion. For x86 architectures only, perhaps not so much. It seems UEFI chose to name following the little endian memory byte order?

I happen to be thinking of plotting an RGBA pixel by writing an entire longword at once — i.e. through a pointer to a uint32_t. In your experience, do you think this might be a good (or perhaps bad) idea?

The posted drawbitmap function is brute force, and definitely not optimized. I don't like switching on pixel format in the inner loop, nor drawing a byte at a time. It was written to show with certainty what is happening with regards to translating a 1bpp glyph bitmap to 16, 24 and 32bpp formats, which IMO at this stage is more important than optimizing. (And 24bpp can't be written using *(uint32_t *). Feel free to use what you can, or start from scratch.

In the next step, each of the inner loop framebuffer writes could/should be replaced with faster code. Since it is so easy to get the red/blue order mixed up before proving correctness, I've found it better to start with brute force code. (BTW, most of these routines will get replaced with "conversion" blits that better handle 1bpp->RGBA, 8bpp->RGBA(alpha blend), or RGBA->RGBA for speed or better looking glyphs.

On another future point - it's not clear we want to always draw directly to the HW framebuffer. That will work for now, but there are a number of reasons why drawing into an offscreen buffer, then quickly transferring that to the actual framebuffer, make sense. In some of those cases, it even make sense to always keep the offscreen buffer in RGBA pixel format, and use a fast conversion blit only on final output, perhaps based on a system timer.

Thank you!

tkchia commented 1 year ago

Hello @ghaerr,

The drawbitmap function could be used outside of a VGA console, with slightly modified parameters, to merely display an "A" glyph from the compiled-in font table in the top left screen corner, and then test cycle through the various VESA modes, for instance.

Well, drawing an "A" glyph is a great way to test that graphical video output is working. :slightly_smiling_face:

As for switching video modes, unfortunately I do not think VESA offers a (standard) way to do that from within protected mode. So I would think changing video modes is best done from within the real mode initialization code (perhaps with occasional sojourns to 16-/32-bit protected mode to draw into the frame buffer).

Thank you!

jart commented 1 year ago

@ghaerr The way to get truetype rendering happening in Cosmo is as follows:

  1. Put https://justine.lol/NotoSansMono-Regular.ttf in the repo in usr/share/fonts/...
  2. Pick a library, and ensure o/$(MODE)/usr/share/fonts/NotoSansMono-Regular.ttf.zip.o gets included in it
  3. Use STATIC_YOINK("zip_uri_support"); and STATIC_YOINK("usr/share/fonts/NotoSansMono-Regular.ttf"); in the library that needs it
  4. Use open("usr/share/fonts/NotoSansMono-Regular.ttf", O_RDONLY) and slurp it into memory.
  5. Depend on THIRD_PARTY_STB
  6. Pass the font data to stbtt_InitFont().
jart commented 1 year ago

What video mode did TempleOS use? I seem to recall Terry Davis picked this one 640x480 (or so) video mode that was basically guaranteed to work on every PC ever made that has an x86-64 chip. Could we just use that to start?

jart commented 1 year ago

Please feel free to use any of this code to get the VGA graphical console running. I plan on posting the emulator and/or pieces there of, depending on what @jart wants to do.

@ghaerr Could you please send us a small pull request that changes anything about Cosmopolitan VGA even if it's tiny and only changes like five lines of code? I want to see you in the change log now that you've gone through the copyright assignment process.

tkchia commented 1 year ago

Hello @jart,

What video mode did TempleOS use? I seem to recall Terry Davis picked this one 640x480 (or so) video mode that was basically guaranteed to work on every PC ever made that has an x86-64 chip. Could we just use that to start?

That will be VGA mode 0x0012 — 640 × 480 × 16 colours — and I just checked that, yes, TempleOS uses this mode.

This is a planar graphics mode, rather than a flat frame buffer mode, so drawing pixels onto the screen becomes somewhat more complicated than just writing into a flat area of physical memory. (Most probably, the address space for the video memory will also need to be mapped as non-cacheable.)

If you ask me, I could probably try implementing something to use that — but I will have to relearn all the incantations needed to correctly draw the pixels. :neutral_face:

20221002-3

(There is also mode 0x0013, a more-or-less "universal" VGA graphics mode which also happens to be a flat frame buffer mode, but it has a rather low resolution (320 × 200 × 256). Basically, like Microsoft's donkey.bas, except with more colours.)

20221002-2

Thank you!

jart commented 1 year ago

Does the hardware memory layout really matter though? The way I imagine this working is I might use an API to tune the width and height, but ultimately I don't want to think about any data structure except uint8_t[3][y][x]. I want Cosmopolitan VGA to be the thing that worries about efficiently copying that to video memory, quantizing colors if it must, interlacing the colors, or pulling them apart into planes. All I care about is uint8_t[3][y][x]. Is there any reason why the app developer would want to care about any other data structure?

ghaerr commented 1 year ago

Hello @jart, hello @tkchia,

All I care about is uint8_t[3][y][x].

That will be VGA mode 0x0012 — 640 × 480 × 16 colours This is a planar graphics mode, rather than a flat frame buffer mode, so drawing pixels onto the screen becomes somewhat more complicated

Given the above, I have a possible suggestion: rather than try to draw directly in "old fashioned VGA planar mode", keep an offscreen VGA console buffer in RGB format (not RGBA) and use a fast conversion blit to the actual hardware framebuffer periodically. This has the benefit of keeping the VGA console itself using a known, constant RGB pixel type definition, and dealing with the hardware complexities separately in the conversion blit. I mentioned this as a possible option previously in https://github.com/jart/cosmopolitan/issues/638#issuecomment-1262678923.

Since @tkchia has got the real-mode selection of all the hardware pixel formats working, and has some basic glyph draw code for those formats implemented, taking the step of enhancing the VGA console to actually work in graphics mode using those same formats IMO makes sense for the next step, and allows for testing of the underlying pixel format. After that is working, I can supply conversion blits for each of the hardware pixel formats, including a planar blit for 640x480x16, if desired.

Is there any reason why the app developer would want to care about any other data structure?

For generalized graphics drawing, it is nice to have an alpha channel. However, for the (offscreen) VGA console framebuffer, no, since there won't be any alpha used unless truetype fonts are implemented, and then in that case, only a blit which blends the font's alpha channel into the offscreen RGB framebuffer is needed. Using only a single data structure within the Cosmopolitan-accessible API lowers general programmer complexity by quite a bit, even with later support for graphics.

@tkchia, I would be happy to look further into these ideas after your current graphical VGA console implementation using the non-planar VESA framebuffer modes is accepted, at which point we can move to @jart's request of only using a 24bpp offscreen VGA framebuffer. With your current approach, the console doesn't actually need a seperate offscreen buffer, but can draw directly to the hardware. It appears you are very close to having that working, correct?

Thank you!

jart commented 1 year ago

I can supply conversion blits for each of the hardware pixel formats, including a planar blit for 640x480x16, if desired.

I'd welcome that very much, since I love sitting down with code like that and trying to make it go faster using things like SSE / AVX assembly hacks.

For generalized graphics drawing, it is nice to have an alpha channel.

I agree but alpha is orthogonal to getting graphics to display on a monitor, because monitors aren't translucent :P If Cosmopolitan provides a uint8_t[3][y][x] abstraction for 8-bit RGB then nothing is preventing the user from compositing RGBA planes onto it. That's one of the reasons why I suggest placing [3] on the outside of the array, rather than on the inside. Because if [3] was on the inside, then the user would need to interlace pixels when compositing, and that's slow since it isn't easily vectorizable.

at which point we can move to @jart's request of only using a 24bpp offscreen VGA framebuffer

You might be misunderstanding me. The goal here isn't to determine a style for ourselves, but rather to create a meaningful abstraction for our users. We, as in us three, are totally interested in all these weird hardware layouts. We figure that stuff out so the user doesn't have to. Once we're done writing all our happy blitting algorithms, the user will have a simple singular RGB vision that makes graphical programming effortless.

tkchia commented 1 year ago

Hello @jart,

I am not sure if it will be a net win to have uint8_t[3][y][x] rather than, say, uint8_t[y][x][4].

[3] on the outside means that the code can process the red components (e.g.) of more pixels at one go, but in return it means that the information for each individual pixel is spread out across 3 places — which means (in return) we need to do 3 outer loops to draw one thing, instead of one loop.

And when it is time to update the physical frame buffer, this means that the driver needs to gather data from 3 places in order to figure out how to write out one pixel, e.g. for a BGR565 video mode we need to do

fb[0][0] = canvas[0][0][0] >> 3       | /* blue */
           canvas[1][0][0] >> 2 <<  5 | /* green */
           canvas[2][0][0] >> 3 << 11   /* red */

fb[0][1] = canvas[0][0][1] >> 3       | /* blue */
           canvas[1][0][1] >> 2 <<  5 | /* green */
           canvas[2][0][1] >> 3 << 11   /* red */

etc. rather than something like

fb[0][0] = canvas[0][0][0] >> 3       | /* blue */
           canvas[0][0][1] >> 2 <<  5 | /* green */
           canvas[0][0][2] >> 3 << 11   /* red */

fb[0][1] = canvas[0][1][0] >> 3       | /* blue */
           canvas[0][1][1] >> 2 <<  5 | /* green */
           canvas[0][1][2] >> 3 << 11   /* red */

The uint8_t[3][y][x] layout might be advantageous for drawing in planar graphics modes, but even then I am not quite sure.

Thank you!

jart commented 1 year ago

You're correct that what you suggest is probably better if you're doing one pixel at a time scalar operations. However it's trickier in my experience to pipeline memory like that into SIMD instructions, because it needs to be reshaped. For example, if it were possible to do sixteen of the slower pixel conversion at the same time, rather than 1x at a time of the faster conversion, which would you choose? Ultimately though it's something where we'd want to sit down, write the code, and see what the numbers say. Those are just my biases based on my personal experience fiddling with it.

paulwratt commented 1 year ago

Based on my obsevations of this and other threads, if things are to be achived quickly the following seems fair:

that should allow for good blit routines to be built quickly, with more advance and optimized to come later, including custom assembler routines, and those can ignore/change intermediate store as need be, as that does not affect the Users "view of the world" or their ability to use it (ie "get things done").


It seems to me there is some concern about symantics at this early stage. Just as Users doesn't need to know about underlying structure, just that it there is continuity no matter the selected device, screen, resolution or pixel structure, so to can the [3][x][y] be subverted based on "driver" needs, wants or optimizers.

I would just keep the structure in mind, while getting things off the ground. If that means getting Vesa FB working first, because its "simple", so be it. I would also point out that Planar is not an unknown quantity, and there are plenty of resources to assist in developing "good blit routines".

It sounds to me, that the [3][x][y] layout @jart is looking for, comes down to the fact SIMD and AVX can more "simply" be applied. It also sounds like an easy "fix" to transpose that array into an actual 32bit frame buffer. That forefills "2 ends of a clamp" (while keeping the door open for things like advanced TTF rendering), allowing things to be usable straight off the bat (for the user) and get things moving quickly (for the developer), with the next step being "planar addressing".

Remember that Planar was easy to handle electronically, and allowed for pixel packing making less RAM required at a time when RAM was expensive.

Honestly its not more complex than various different pixel formats in a frame buffer, just different (eg RPi had a RGB565, yet in the frame buffer it was stored as little-endian 16bit value, which means it look like something else to a big-endian CPU).

ghaerr commented 1 year ago

Hello @paulwratt,

Thank you for your comments.

Remember that Planar was easy to handle electronically...

Yes, that was probably the reason.

Honestly its not more complex than various different pixel formats in a frame buffer

Actually, I beg to differ on planar being not more complex; each of the pixel bits for each of the four colors must be separately shifted/selected, then an OUTB instruction to select the mask, then a read-modify-write of a 1-bit for every pixel. Very much more slow and complicated than a usual conversion blit, which is why later graphics cards added more hardware to handle initially 16-bits (5/5/5 or 5/6/5) for RGB, then 8-bits per color (24bpp or 32bpp), since programmers often used RGB internally for color specification (and still do, along with others). Drawing pixels on VGA hardware is just plain slow, and no modern OS uses that format except for extreme initial compatibility. That said, a very early version of Microwindows has a VGA planar driver that could be ported to Cosmo used for directly drawing glyphs to hardware, or converted to a conversion blit, should we desire.

if things are to be achived quickly the following seems fair:

I would suggest that the following order might allow a graphics console to become operational quickly, while then attending to some of the comments or designs requiring more work afterwards:

Thank you!

tkchia commented 1 year ago

Hello @ghaerr, hello @jart, hello @paulwratt,

OK, I have now got some basic character output working with VGA graphics modes — see my PR https://github.com/jart/cosmopolitan/pull/649.

The code currently implements an off-screen canvas which always has an BGRX8888 format (namely uint8_t[x][y][4]). This is mainly because:

Thank you!

tkchia commented 1 year ago

Some thoughts:

Thank you!

jart commented 1 year ago

Yes I agree moving quickly is the most important thing. Watching you do this is a beauty to behold. You seem to be following standard practices based on my reading of other framebuffer gui type codebases in the past. Now that you know my dream, as long as there's room for my dream in this future, I'm happy to continue supporting where you're taking us.

tkchia commented 1 year ago

Hello @jart,

Thank you — I definitely still need to defer to yourself and @ghaerr on a lot of matters (in particular I do not have much experience with using SIMD instructions).

Hello @jart, hello @ghaerr,

I still think getting a crash console a.k.a. "screen of death" working on the VGA console — whether in text modes or graphics modes — is very important. Project-wise this is high up on my to-do list.

The "screen of death" is now working: https://github.com/jart/cosmopolitan/pull/650. In the end, to support this, I basically implemented two sets of character output routines, both

Thank you!

ghaerr commented 1 year ago

Hello @tkchia,

The "screen of death" is now working: https://github.com/jart/cosmopolitan/pull/650.

Very nice! I'm sure that will greatly help to develop and test more quickly on bare hardware!!!

I basically implemented two sets of character output routines

I'm wondering, perhaps instead of nearly duplicating many of the routines, and as such having to use a .inc file and #defines for FILLRECT, DRAWBITMAP, and the like, keep the duplicated routine function pointers directly in the tty struct (which is already being done for some like tty->update and tty->drawchar anyways). There could be a static tty struct defined for the _TtyKlog* routines. The same routines could be used and just redirect using tty->drawbitmap(...), etc. This gets rid of a lot of messiness in tty-klog.greg.c. What do you think of that, is there a reason the code needs to be duplicated?

In some of the graphics code I've written, we also got around the issue of #define'ing COLOR and BPP by always keeping a standard (RGBA) color register format, which is always accepted by any of the routines, and then converting the color to HW format at the latest point possible. I haven't had time to fully review your code to understand how the two different routines work.

Just some thoughts. Since this is working now, it might be best to commit and reduce duplications in a later post.

Thank you!

tkchia commented 1 year ago

Hello @ghaerr,

The same routines could be used and just redirect using tty->drawbitmap(...), etc. This gets rid of a lot of messiness in tty-klog.greg.c. What do you think of that, is there a reason the code needs to be duplicated?

Let me try that and see if it results in better code. But before that, I am pushing some easy tweaks to reduce the size of the "screen of death" code (it does not need to be fast, but ideally it should be small).

Thank you!

tkchia commented 1 year ago

Hello @ghaerr,

The same routines could be used and just redirect using tty->drawbitmap(...), etc. This gets rid of a lot of messiness in tty-klog.greg.c. What do you think of that, is there a reason the code needs to be duplicated?

Let me try that and see if it results in better code.

I did a quick-and-dirty patch to see what would happen if I arrange to call TtyKlog{16, 32}DrawBitmap etc. from a single DRAWCHAR function, by way of ->drawbitmap function pointers inside the struct Tty. It turns out the compiler actually generates slightly more code for the module when I do this — 1,661 vs. 1,436 bytes in the .text section (!).

It seems that GCC's aggressive function inlining makes up for the space "wasted" when I duplicate the callers' code.

(But who knows — the trade-offs might change somewhat in the future, if the drawbitmap, fillrect, and moverect primitives are exported as application-callable functions in their own right.)

Thank you!

tkchia commented 1 year ago

Hello @ghaerr,

By the way, I have been trying to build your Microwindows code for the Cosmopolitan platform (ARCH=COSMO), and trying to run it on the Linux (non-X-Window) console — with some success. :slightly_smiling_face:

The main problem is that Cosmopolitan does not yet know about the Linux ioctl's (FBIOGET_VSCREENINFO etc.) needed to truly work with the frame buffer device file and obtain its geometry. It is a bit tricky to add these into Cosmopolitan, since the ioctl's are very Linux-specific — it seems FreeBSD and NetBSD / OpenBSD have somewhat different interfaces for talking to the kernel's console driver.

(I probably need to download a NetBSD / OpenBSD image to try it out and see how it works...)

Thank you!

jart commented 1 year ago

@tkchia It's fine to add a feature that Linux-only so long as it's documented as such. We have a lot of platform specific ioctl()'s and we usually don't polyfill them because ioctl's have a tendency to be abstracted by higher level APIs, e.g. grantpt(), unlockpt(), etc.

I've attempted getting Linux framebuffer support working in the past. You can see what I did at https://github.com/jart/cosmopolitan/blob/master/tool/viz/printvideo.c#L1440 and https://github.com/jart/cosmopolitan/blob/master/libc/calls/struct/framebufferfixedscreeninfo.h and https://github.com/jart/cosmopolitan/blob/master/libc/calls/struct/framebuffervirtualscreeninfo.h

ghaerr commented 1 year ago

Hello @tkchia,

I have been trying to build your Microwindows code for the Cosmopolitan platform (ARCH=COSMO) The main problem is that Cosmopolitan does not yet know about the Linux ioctl's (FBIOGET_VSCREENINFO etc.) needed to truly work with the frame buffer device file and obtain its geometry.

Nice! The Microwindows framebuffer driver for Linux is a bit dated, and there were a number of ifdefs added for various embedded Linux systems that used alternative mechanisms for accessing the framebuffer configuration and geometry. These days, I think few folks use Linux console framebuffer on desktop, although some still use a version of X Windows (below GTK or KDE) built to run on framebuffer rather than directly accessing the video hardware. There is also the potential complication of supporting multiple screens (accessed via Alt-{1-3...}) that may be in text mode. (The VTSWITCH option enables this). We probably don't need any of that in Cosmo. Thus, we don't necessarily need to support emulating the original Linux struct fb_var_screeninfo structure.

(I probably need to download a NetBSD / OpenBSD image to try it out and see how it works...)

I have been thinking of creating a version of (and set of enhancements for) Microwindows just for Cosmopolitan, which would work more simply, by making a single call to Cosmo libc to return a structure that would contain all the required configuration information (i.e. framebuffer address, geometry, etc). Instead of having Cosmo emulate a messy non-portable Linux-only structure, if we provided a simple struct framebuffer *GetFramebuffer() API, it would return success for bare metal (for now), and later return success for possible SDL or other portable graphics layers for macOS, NetBSD, Linux, etc. The Cosmo programs would all run portably by using the framebuffer information returned by this (hopefully single) API call.

This could all be done by writing a new screen driver drivers/scr_cosmo.c that calls GetFramebuffer() in the Microwindows repo. (Doing that would also allow for bringing over the Nuklear, MicroUI and AGG immediate mode libraries as well. I have added enhancements to the latter two libraries to allow for applications to draw into an emulated framebuffer (within a window) using the graphics library API only. An "emulated framebuffer" is a virtual framebuffer which is setup to appear as a Microwindows window as a subset of the larger actual hardware framebuffer. It works using a trick of setting the virtual framebuffer address to the top left corner of the window, but setting the "window" pitch/stride equal to the hardware framebuffer. Kind of cool :)

IMO, this approach gives some much-needed simplicity to allow for running a number of well known graphics libraries on Cosmo (such as AGG).

That said, there's nothing wrong with implementing both approaches, if existing compatibility is more important.

Thank you!

tkchia commented 1 year ago

Hello @jart,

I've attempted getting Linux framebuffer support working in the past.

Thanks! Let me see how I might be able to build on it. Perhaps I can add the relevant ioctl numbers from Linux and *BSD to libc/sysv/consts.sh, then see where I should go from there.

Hello @ghaerr,

Instead of having Cosmo emulate a messy non-portable Linux-only structure, if we provided a simple struct framebuffer *GetFramebuffer() API, it would return success for bare metal (for now), and later return success for possible SDL or other portable graphics layers for macOS, NetBSD, Linux, etc.

I suppose so. It is not quite obvious though what exactly such an API should look like — so I am trying to look at existing code that uses the Linux / BSD ioctl's to get some ideas. It will be nice if the API can be stable. There is no existing standard or even common practice, unlike the case of grantpt() etc. :neutral_face:

(I also hope to come up with a semi-reasonable API for changing the console font. Then on a bare metal setup, a program can perhaps start out with a bitmap font, then later switch to a TrueType font or graymap font, once file I/O and other C library features are up and running. For switching the console font, Linux has PIO_FONTX, KDFONTOP, etc. via <linux/kd.h>; FreeBSD has <sys/consio.h>; while OpenBSD (and NetBSD?) has <dev/wscons/wsconsio.h>.)

Thank you!

tkchia commented 1 year ago

Hello @jart, hello @ghaerr,

  1. Use STATIC_YOINK("zip_uri_support"); and STATIC_YOINK("usr/share/fonts/NotoSansMono-Regular.ttf"); in the library that needs it
  2. Use open("usr/share/fonts/NotoSansMono-Regular.ttf", O_RDONLY) and slurp it into memory.

It seems that zipos support under bare metal is not quite complete yet (as of https://github.com/jart/cosmopolitan/commit/648bf6555c06a3655cab1a9ae078d329c192d39a). (I can read zipos files under Linux, but I cannot read the same files on bare metal.) Maybe I will see what I can do about this.

(Edit: https://github.com/jart/cosmopolitan/pull/667 .)

Thank you!

tkchia commented 1 year ago

Hello @jart, hello @ghaerr,

OK...

20221020-1

This is a quick proof of concept (https://github.com/tkchia/cosmopolitan/compare/tkchia/20221017...tkchia:cosmopolitan:tkchia/20221019-draft-vfb-api-poc) of a thin API layer for interacting with either a bare metal video frame buffer or a Linux /dev/fb* frame buffer.

One problem is that I am not sure how to deal with the difference between "direct video memory" frame buffers and "deferred I/O" frame buffers. From what I understand (so far), the normal procedure for updating video memory is this:

while for deferred I/O frame buffers, it is probably better to do something like this:

For now my draft API has provisions for both :one: and :two:. Thank you!

ghaerr commented 1 year ago

Hello @tkchia,

This is a quick proof of concept of a thin API layer for interacting with either a bare metal video frame buffer or a Linux /dev/fb* frame buffer.

I have a number of comments about the POC implementation, for your consideration. I suppose that, at the end, the features within a framebuffer API really depend on exactly what is trying to be accomplished. For myself, I am concerned with creating something that makes it easy for someone to port an existing graphics library onto Cosmo, and then to somehow have that functionality made portable across its wide platform support. This implementation seems to side with providing stricter compatability with the non-portable, old-style Linux kernel framebuffer API. Both have their benefits. Perhaps more on that later.

With my experience dealing with Linux framebuffer video hardware, almost all implement direct video memory, with only a very few 2000's-era embedded Linux systems implementing some kind of deferred I/O - and those only because the hardware (usually hand-held) wasn't complex enough to just display the framebuffer contents. These systems were usually 1bpp or 4bpp systems, and used a proprietary ioctl in order to tell the underlying driver to do some magic between the framebuffer contents and the display.

It is my understanding that almost all modern video hardware does not require (nor desire) vsync or vblank waiting before drawing into the framebuffer. Perhaps this started with the advent of LED/LCD displays, rather than TV-style monitors, I am not sure. To my knowledge, only the CGA requires waiting, in order to remove snow artifacts from the display.

In the business of "emulated" framebuffers, for me, this means software-emulation of the pixel contents of a non-hardware framebuffer, allocated from normal main memory and interpreted by a seperate software library. For others, this could mean additional hardware described above for deferred I/O, although those systems are nowadays rare and I don't think there are any desktop systems that work as such. My original idea for use of an emulated framebuffer would be to allow the desktop display of a graphical application, when the underlying operating system did not support a hardware-style video framebuffer (or one that was not programmatically accessible, as is more commonly the case). For these style systems, a call to a library-supplied "update" function, which is not part of the hardware framebuffer, nor msync, would be used, and likely dependent on the library being used to emulate the pixel contents onto a window within the running graphical OS.

While providing a Linux-compatible framebuffer API has the big benefit of being able to test a framebuffer application on a specially-configured Linux desktop, the problem(s) of getting that same application to run on anything other than bare metal isn't being addressed, and solving that will likely break any Linux compatibility. Nonetheless, your POC is a good move forward for bare metal and the ability to test if one is sitting in front of a Linux console. A separately emulated framebuffer implementation my solve the problem of other OS support in the future.

All that said, I have the following comments on the POC implementation itself:

Thank you!

paulwratt commented 1 year ago

One problem is that I am not sure how to deal with the difference between "direct video memory" frame buffers and "deferred I/O" frame buffers. From what I understand (so far), the normal procedure for updating video memory is this:

  • wait for a vertical synchronization (vsync) or vertical retrace (vblank) event 1️⃣
  • draw into the frame buffer

while for deferred I/O frame buffers, it is probably better to do something like this:

  • draw into the frame buffer
  • do an msync (e.g.) to update actual video from the possibly emulated frame buffer. 2️⃣

For now my draft API has provisions for both 1️⃣ and 2️⃣. Thank you!

Is there any reason why you are choosing one over the other. Both techniques have use cases, one for speed (FPS) and one for volume (where 1FPS would be interrupted due to time taken).

It seems that some of the ideas behind what could and could not be useful in a frame buffer context stems from the percieved hardware being used. I suggest getting something like an x86_64 SBC with one ot 2 different "display types", as to allow better "portable" development, where any code is then verified against both HDMI and VGA output (DVI should perform the same as HDMI, but allow variable speeds upto 240Hz).

Actually on this subject, has anyone considered an SPI Display in there thought processes? This seems like a likely candidate for a Bare Metal Cosmo APE "in the wild" (and extra "info" displays on a Desktop OS), ontop of a (more common) "beige box" configuration, to-witt does anyone have access to a PiKVM to which they can verify a "headless" configuration?

At least the idea of what constitutes a display or graphical output should be explored, even if it can not be physically verified ..


NOTE: I will be able to help out with some of these configurations in the future, I will let people know when that happens.

paulwratt commented 1 year ago

Besides getting things going as quick as possible, based on @ghaerr observations and comments, its appears that multiple test configurations are needed. Besides those 4 of us commenting on this subject, is there anyone or anyway to assist with this. I just alluded to something(s), but I fear they wont be ready before this topic is concluded - some lateral thinking may be needed here, ideas that would also help individuals in other areas of development would probably be the most beneficial ..


I will try and get one or more different x86_64 SBC (I have 12v power restrictions) with different screen configurations, but I am constrained in timeframe by the need to purchase 32Gb ECC ram modules (at exorbitant prices) and then a (at least and APU) CPU before I can participate .. in the meantime I have _no x8664 platforms with which to contribute (at least) towards testing (let alone building) .. so I'm not much use atm ..

agreppin commented 1 year ago

Hello all here,

I appreciate your work. I interested about bare metal support. I started a little project (unpublished so far ...) to implement a micro-kernel (à la Amiga / AROS). My basis is:

Painfully rebooted many times the PCs to debug RS232 + GDB server code itself and put aside the project for other ones ... About GFX, what I had in mind was to support Intel HD first as a loadable ELF module, then look at the NVIDIA Nouveau driver from Linux / gnome.org. I also have one AMD A10 with APU ...

Just tell me if something above is useful for cosmo.

ghaerr commented 1 year ago

Hello @agreppin,

Looks like you've been busy with lots of neat stuff with your micro-kernel!

Just tell me if something above is useful for cosmo.

Are you using any of the cosmo code for your micro-kernel, or is this all separate? It would be nice to take a look at it should you decide to publish it, as this would allow for better assessment for its suitability within Cosmopolitan.

support Intel HD first as a loadable ELF module, then look at the NVIDIA Nouveau driver from Linux / gnome.org. I also have one AMD A10 with APU ...

With regards to graphics drivers, it's currently thought that using the graphics functions built into the VESA BIOS supporting frameuffer are best for initial Cosmo bare metal support, since that allows the .com files to be most portable across the wide variety of PCs out there, and allows for a simple software API for Cosmo graphics. Of course, having a conventional "video driver" built for a specific card could provide more flexibility with additional API/hardware access (likely for GPU function replacement of software-only framebuffer). The significant downside is the every .com file would have to include a specific video driver, which would require writing lots of video drivers, and essentially turns .com files into mini-OS images. Until more graphics applications are built for Cosmo, I'm not sure having non-framebuffer video access is that useful, unless it adds a framebuffer driver that is otherwise not available using VESA for that card.

At this point, @tkchia has got a minimal graphical VESA VGA console running, with plans to add user-loadable (bitmap and/or truetype) font support within the .com file, which will allow for supporting a wider variety of VESA resolutions as well as enhanced visual text output, for any application using framebuffer output.

Thank you!

agreppin commented 1 year ago

Hello @ghaerr,

I'm confused now about my past projects ... my RAM is not perfect ...

I just started to try to make a video game who needs gfx, sound and keyboard and used SDL for that. Then, I tried to use OpenGL, Vulkan and DirectX ... The think that the SDL API is the most revelant & portable thing to me. But, in the cosmo context the first step is to go with the VESA modes and define a clean API for the next steps, IHMO. I think you are going the right way to achieve it.

Are you using any of the cosmo code for your micro-kernel ?

No, even didn't knew cosmo at the time (my 1st commit log is 2012, so even earlier).

The significant downside is the every .com file would have to include a specific video driver, which would require writing lots of video drivers, and essentially turns .com files into mini-OS images.

I was thinking of a mini-OS / USB installer with downloadable drivers - to not have restrictions - like Ubuntu does for non-free drivers. In my code, even libc is a loadable ELF module as my loader is in the kernel.

I'm sorry that I can't read all of your words today to respond more precisely. My point of view is that we can share things in common, like I shared some code to the AROS project.

Thank you for you interest !

PS: having Network, Sound & USB drivers for bare metal is another challenge

tkchia commented 1 year ago

Hello @agreppin,

  • minimal ACPI + APIC startup code with timers for a minimal thread scheduler
  • some HPET IRQ on PC + QEMU ... buggy and not working for sure on VirtualBox

I for one am interested to know more about your implementation of the timer (including HPET) IRQ(s) and the interfacing with the APIC — if you are willing.

I think some basic time-keeping functionality will, for a start, be useful for a number of things in Cosmopolitan. And I know that correctly interfacing with the APIC(s) and HPET will take quite a good amount of work.

Thank you!