Can we divide the API between "hardware" and software?

nesbox / TIC-80

TIC-80 is a fantasy computer for making, playing and sharing tiny games.

https://tic80.com

MIT License

4.96k stars 482 forks source link

Can we divide the API between "hardware" and software? #1661

Closed joshgoebel closed 2 years ago

joshgoebel commented 2 years ago

Can we divide the API between "hardware" and software and make the distinction clear?

IE, which API functions are almost truly "hardware opcodes" (like peek, poke, reset)... and which are essentially merely "userspace" - functions that could be entirely replaced by a users own implementations based on accessing the underlying "hardware" directly via the few actual magical "system calls" we do provide.

This is related to #1660, but kind of it's own thing (and I wanted a place to talk about the larger API and when we treat the "hardware" as actual fantasy hardware and when we treat the API as "magic" system calls) so I'll copy part of my concluding remarks 1660 here:

Should the API be allowed/able to do things that are impossible to do by poking/peeking the RAM directly?
If so, when is this allowable and when should it be discouraged?
Should this be a clear line of demarkation?

If/when the core TIC API could be entirely replaced by peek/pokeing the "hardware" I'd say we've achieved a very high fidelity of being a real "fantasy computer" under the covers - which I've always felt was a key goal of the project (am I wrong?). The less this holds true the more we become just a "gaming platform/engine" like https://domeengine.com and less of a "fantasy computer".

My own feeling is the fewer "magic system calls" that exist the better from a fantasy hardware perspective.

With that in mind I thought I'd do a quick review of the entire API and which API functions are "magic" vs just accessing the hardware:

Magic / system calls

~~clip~~ (moved to non-magic now that we have CLIP)
peek & peek4
poke & poke4
exit
reset
sync
time
tstamp
trace

Most of these to me seem very "system call" appropriate - and some have to be like peek and poke themselves. Individual thoughts:

~~clip - could be represented in VRAM "registers" (like BORDER COLOR) in a perfect world. #1508~~
time and stamp - could have timer "registers" in RAM , but I'm not sure this adds tons of value.
trace is very meta in that the console exists "outside" the virtual computer in a way, so it makes sense as a magic system call.
sync - could easily be represented as a register (or two?) in RAM, but the system call doesn't bother me here

Audio

I don't have the knowledge here, so I'm leaving these in their own category. I know most of the audio stuff (and state) is in RAM, but I still think the API might be the only "magic" way to start and stop playing? I'm unsure.

music
sfx

Build on top of hardware/RAM/registers:

I say that as in these could be reproduced simply be reading or writing to RAM with peek or poke. The section of ram is indicated based on our memory map.

clip (CLIP in VRAM)
btn & btnp (GAMESPADS and KEYBOARD RAM)
cls (VRAM)
circ & circb (VRAM)
elli & ellib (VRAM)
fget & fset (FLAGS)
font (VRAM/SPRITE)
key & keyp (KEYBOARD RAM)
line (VRAM)
map (VRAM, MAP, TILES)
memcpy & memset (RAM)
mget & mset (`MAP)
mouse (MOUSE)
pix (VRAM)
pmem (PERSISTENT MEMORY)
print (FONT, VRAM)
rect & rectb (VRAM)
spr (TILES, SPRITES, VRAM)
tri & trib (VRAM)
textri (VRAM)

And of course the BIG caveat/exception with ALL of these is their purely "magical" behavior in OVR, which AFAIK can't be reproduced at all by peeking/poking memory, which is the entire reason I opened #1660. Otherwise our drawing functions are very "rooted in hardware" - which I think is a good - and something that we should strive to keep true.

joshgoebel commented 2 years ago

The situation here is much improved with the new OVR behavior. :-) Far less magic and the hardware surface of the API is much smaller now. Adding CLIP to VRAM also helped in this regard.

Question: Assuming (yes, big assumption) we could find more space in RAM for registers do you see any benefits in moving the timing registers into RAM such that tstamp and time might be rewritten with peek? It would be an open question whether those registers should also be writable or not... :-) In the smaller real hardware I'm familiar with (Arduino) the "clock" (we have no clock chip) is handled via interrupt... so the interrupt is triggered at a given frequency and then a counter in RAM is updated to keep track of ms since boot. The software API then just reads from that memory location when millis() is called.

I guess this is an argument/suggestion for time perhaps being software and tstamp being hardware - as it would need to talk to our fantasy RTC (real time clock) chip, etc... and we've never exposed or explained any of our I/O subsystems really...

I don't think this is a high priority, just considering it in the light of being consistent.

nesbox commented 2 years ago

Yes, we have some free memory in the RAM, about 12796 bytes.

>help ram
+-----------------------------------+
|           96KB RAM LAYOUT         |
+-------+-------------------+-------+
| ADDR  | INFO              | BYTES |
+-------+-------------------+-------+
| 00000 | <VRAM>            | 16384 |
| 04000 | TILES             | 8192  |
| 06000 | SPRITES           | 8192  |
| 08000 | MAP               | 32640 |
| 0FF80 | GAMEPADS          | 4     |
| 0FF84 | MOUSE             | 4     |
| 0FF88 | KEYBOARD          | 4     |
| 0FF8C | SFX STATE         | 16    |
| 0FF9C | SOUND REGISTERS   | 72    |
| 0FFE4 | WAVEFORMS         | 256   |
| 100E4 | SFX               | 4224  |
| 11164 | MUSIC PATTERNS    | 11520 |
| 13E64 | MUSIC TRACKS      | 408   |
| 13FFC | MUSIC STATE       | 4     |
| 14000 | STEREO VOLUME     | 4     |
| 14004 | PERSISTENT MEMORY | 1024  |
| 14404 | SPRITE FLAGS      | 512   |
| 14604 | FONT              | 1016  |
| 149FC | FONT PARAMS       | 8     |
| 14A04 | ALT FONT          | 1016  |
| 14DFC | ALT FONT PARAMS   | 8     |
| 14E04 | ... (free)        | 12796 | <<<
+-------+-------------------+-------+

We could place the time registers you want here.

Anrock commented 2 years ago

@joshgoebel Thanks for bringing up this topic. I'm all hands for all-hardware API with as little as possible "magical" APIs just for sake of realism itself. Couple of assorted thoughts, wishful thinking and blatant speculations follow.

I think that almost every possible magical command can actually be replaced with hardware registers in a very mechanical way. You can take any function, say line x0 x1 y0 y1 clr, and turn it into hardware one by providing hardware registers for arguments and some kind of control register.

Doing that translation mechanically will hugely inflate occupied address space but if you do it a little bit smarter it's not so bad. For example all graphical commands can share registers since lots of arguments are same (coordinates, color). Same for other "families" of commands.

Diving further into more realistic hardware - making magical commands just a sequence of memory reads/writes is sorta more realistic than pure magic, but it's not how it's done in hardware.

Drawing a line or sprite pixel-by-pixel with CPU commands (script lines in TIC case, if we assume script engine is CPU and script code is it's program) is crazy slow. That's why we have all those line and circle functions (and for convenience too).

In hardware there will be a dedicated devices to offload work from CPU on. Like GPU for graphical stuff or DMA controller for bulk memory operations like memcpy and memset.

If, for example, graphical stuff is abstracted away with fantasy GPU device it's possible to free huge amount of adress space by removing VRAM from it and providing (very small in comparison) set of hardware registers to control that GPU.

@joshgoebel

trace is very meta in that the console exists "outside" the virtual computer in a way, so it makes sense as a magic system call.

If we assume trace is something like UART (which is very widespread solution for debugging) and host PC is listening on the other side then trace can also be replaced with hardware registers. Write a char in the register and it appears in console, easy-peasy.

Following down this road, somewhere near the end TIC's-to-script-engine API will only need to process memory reads and writes. And vice versa - any new language for TIC will need to expose only memory reads and writes to script space instead of whole current API. I think this is quite a feat by itself.

But I also think it will make viable to separate TIC core that only knows memory reads and writes and push out all language implementation code into plugins or separate projects or whatever. That memory-based API would be the stable and sturdy bridge that will make it possible. Maybe even make it work interprocess, so absolutely anything can be used as script for TIC without incorporating it into TIC codebase.

Another big thing about TIC that is unrealistic is that CPU and code and it's execution is deeply magical and, as a consequence, is out of line with the rest of retro restrictions.

What I mean is that there is no restriction or any control on script execution speed and thus there are leaking abstractions like same program written in different languages may have different performance. Or, if you have very beefy PC, you can possibly run 3D scenes with realtime raytracing. On retro console.

Same goes for RAM restriction for program data. It is also magical since every script engine manages it's own memory and you absolutely able to allocate some crazy big data structures on heap of your host PC.

And same goes for code size restriction but for lesser extent. There is a restriction for code size on a cartridge and we can pretend that TIC is Harward architecture and all code resides in ROM chip on a cartridge and it doesn't occupy any RAM space and we don't have direct access to it. Fine.

Some discrepancy comes from that not languages are equal in expressive power and in 30 characters of LUA or JS you can do much more than in hypothetical assembler. And as a consequence it's the other way around regarding coding for real console - while in real console one would abandon highlevel language and go for assembly for faster and more compact code - in TIC it's the other way around.

Honestly I don't have a firm idea on how to implement RAM restriction. Sure, embracing memory mapped dedicated device will allow to free up lots of address space to put hypothethical RAM there but how to restrict script engines to that RAM?

Maybe script engines APIs provide a way to restrict memory usage or have explicit alloc/free callbacks so TIC can keep track of consumed memory internally and deny allocation requests that go above that limit - sounds pretty real, but I'm not sure if all engines provide that. But it a may be a solution.

joshgoebel commented 2 years ago

Wow, there is a LOT here, I'll try to touch on the big aspects. I think my original point had more to do with whether or not we were drifting off course from "gfx commands just draw to RAM" (under the covers) - which was always true (and fun) until OVR came along and changed that... and then it was fixed with the new OVR in 0.9... The point was it was/is weird IMHO (and a precedent) to have a magical line API but then be unable to replicate that by writing to RAM directly - like some type of magic (rainbow colors) lines that you could only draw with lines but not just by poking the video RAM. Personally I think a step in that direction is a step the wrong direction.

It's fine that line is super fast - I understand why that code is C (and wasn't suggesting that it not be)... but the ability for someone to write their own and the accessibility of VRAM to make that possible is one of the "cool" things about TIC-80.

IE, there are two different ways to think about line (in the built-in API) in a fantasy hardware sense. You could think about it as just software, but it's in "machine code"... so of course it's faster than writing the same function in our perhaps "interpreted" language... I guess that's sort of how I think about it.

The other way is to think that the graphics API is powered by a TIC-80 "GPU" so that these are literal drawing commands we're firing at the GPU hardware itself... and that's why they are so fast. That's probably slightly more accurate (though neither of these analogies is entirely correct). I think the "graphics commands just draw to RAM" makes it more "vintage" and less "modern"... which to me is a win.

If we assume trace is something like UART (which is very widespread solution for debugging) and host PC is listening on the other side

Sure I understand, I'm familiar with Arduino and tiny architecture chips and how such things work. We could indeed do this, but I don't think there is much of a win to doing so... no one [generally] is clamoring for that level of "hardware fidelity". Making such things much more "realish" would be more for the sake of completeness than anything else.

Another big thing about TIC that is unrealistic is that CPU and code and it's execution is deeply magical and

That's a separate topic and something we could indeed do something about (if we so desired). PICO-8 has such limits (on CPU and RAM). We should start another thread if we really want to pursue that topic of "CPU/RAM limits".

And vice versa - any new language for TIC will need to expose only memory reads and writes to script space instead of whole current API.

I'd argue that's what we already do - just not via "memory reads" or "memory mapped IO". The API (in all scripting languages) is a VERY, VERY thin wrapper (that's why it's easy-is to add new languages)... all it does is call the native C functions. I don't see any real advantage to replacing:

pix(0,0,12)

With:

gfxPush(PIX_COMMAND)
gfxWriteRegister(0,0)
gfxWriteRegister(1,0)
gfxCommit()

Where gfx* functions would just be thin wrappers about the fantasy DMA-I/O interface to the fantasy GPU. I don't see what benefit we'd get from doing that. And of course for practical reasons we'd just wrap all that in pix all over again I'd imagine - for convenience... so we're adding another layer of indirection just for pretend sake...

joshgoebel commented 2 years ago

Of the calls I lumped under "Magic / system calls" do you see any big reasons or huge benefits to users if they were something users could "code on their own" against the RAM itself? I feel like (even for time keeping) it might be more for completeness that we'd go that route rather than anything else.

And goodness knows there is plenty to do without invention things that wouldn't really help anyone. :)

Anrock commented 2 years ago

@joshgoebel just to be clear, don't take my previous comment as "we should absolutely do this", it's more like "how far we can go, if we want" type of thing. I'm not insisting on anything, just pouring out my thoughts in hope they would be somehow useful to other people or maybe spark more discussion.

Some background: when I discovered this whole fantasy console thing I was delighted at first but quickly found out that most (active/maintained) fantasy consoles are not really... you know, real. Just a stylized retro-pixely thin wrapper to LUA interpreter and some API functions. Not much of a scratching required to tear a hole in their abstractions. I wanted something more in-depth, to push magical boundary closer to hardware. So I started my own pet project with a goal of "no-magic" fantasy console, since TIC was pretty magical already and I wasn't sure about what nesbox thinks about that direction and various other reasons. Anyway, you know how most pet projects go :D So not much to show in terms of code, but lots of thoughts and notes I made along the way.

Okay, now about that APIs.

How TIC works in fantasy world: user script -executed by-> interpreter (that's written in magical fast asm) -that calls-> magical fast asm functions (line, etc)
-that read/write hardware register-> hardware does it thing

How it actually is (may be incorrect): user script -executed by-> interpreter -that calls-> TIC C functions -that call-> underlying library functions (sdl, etc)

I don't see any real advantage to replacing: pix(0,0,12) With: [...]

Yes, that's true. Memory-only API for the sake of memory-only API achieves nothing for high level languages. Since both model align quite well if you executing "high level" language...

Of the calls I lumped under "Magic / system calls" do you see any big reasons or huge benefits to users if they were something users could "code on their own" against the RAM itself?

...but start breaking up if you try to go one level down and implement some sort of assembly language (that is what I originally intended to do).

Since in assembly there are no magical functions, only registers and memory and any plausible explanation for magic will require a notion of OS somewhere deeper down to handle them.

(It just striked me that if we go with Harvard architecture we can pretend that vendor "magical fast asm functions" are already somewhere on the ROM and our code just calls them, more plausible than OS but still lots of places where this abstraction may leak, like why can't we implement some functions? Because they use undocumented CPU instructions! So lies make more lies and this road can go quite a long way)

So the answer to that question is "if you want to implement something in assembly for even more retro-realistic feel everything should be doable via memory read/writes".

Next question arises from that: "should it be a goal for TIC?". Here, I don't know and I don't think if it's for me to decide.

But it seems there at least some other people than me interested in more gritty details: #1259 #1007 and seems like nesbox isn't immediately against all this. All in all I'd say "every API function must be implementable with memory read/writes" is a prerequisite for #1007

Anrock commented 2 years ago

About "the need in memory mapped devices" vs "pretend that those functions are written assembly, that's why they're faster than chaning pixels one by one".

If we're writing something in assembly there is no other lower level language to be faster, so this abstraction fails and the only thing that can be faster than assembly is dedicated hardware hence need in memory mapped GPU and other things. For high level languages it doesn't make any difference between "line function is just written assembly and that's why it is changing pixels one-by-one faster" and "line function is just written in assembly and it writes to some hardware registers so GPU does pixel stuff".

Basically I'm talking about taking "magical fast asm functions (line, etc)" from fantasy model and pushing it one level down so it turns into "hardware registers that control dedicated hardware devices". And that eliminates all magic things since hardware is by definition opaque to software.

And as a side effect it can reduce API surface to implement for each language - only two mandatory operations to implement: write mem and read mem. Everything else is handled by TIC core which is language-agnostic.

joshgoebel commented 2 years ago

And as a side effect it can reduce API surface to implement for each language - only two mandatory operations to implement: write mem and read mem.

I don't follow. We would not want a language choice with only 2 API functions... even if it was a very low-level language (ie, WebAssembly... we'd still expose the full API from within the language. Either as unique hardware opcodes or as software API (think DOS Int 0x21).

For something like WebAssembly (or anything else where we aren't designing the VM itself, and hence couldn't add hardware ops) then obviously the API would need to be "software", etc...and from a fantasy perspective whether that software API was all software behind the scenes or if we pretended there were hardware opcodes beneath it, that would be entirely up the other users imagination.

Some background: when I discovered this whole fantasy console thing I was delighted at first but quickly found out that most (active/maintained) fantasy consoles are not really... you know, real

Well, I don't think they are supposed to be "real" in the sense you want - they are fantasy... it's the constraints that are supposed to be real. I feel like you would really enjoy Octo, but it's quite limited fantasy hardware, but it's a classic. http://octo-ide.com And of course as already mentioned PICO-8 does have RAM and CPU limits...

...but start breaking up if you try to go one level down and implement some sort of assembly language (that is what I originally intended to do).

But then you have to build in a whole compiler toolchain, which is whole other thing... and part of the hold up with the web assembly discussion... I think there is less interest in adding TIC-80 languages that requires an external compiler toolchain just to turn your code into a valid runnable cartridge...

some sort of assembly language

...many of the languages do this under the covers already. As least Lua and Wren (that I know of) are byte code VMs... so that means at least 4 of the languages we ship are already "compiled to bytecode" (machine code in a fantasy computer sense)...

joshgoebel commented 2 years ago

And of course there is: https://github.com/nesbox/TIC-80/issues/1007

Anrock commented 2 years ago

@joshgoebel sorry for another wall of text and thank you for keeping up this discussion. I think we're getting close to conclusion.

We would not want a language choice with only 2 API functions

I guess I formulated it wrong causing confusion. "Only two mandatory functions" doesn't mean there can be only two and no more, dropping everything else except memory ops. What I meant is "at least those two and however many else you would like" or "only two primitive operations and everything else can be built using them". So all this current LUA/JS/etc API stays the same from user perspective, it's just glue code between TIC C code and language wrapper that can be changed to call a sequence of memory ops instead of calling predefined TIC C functions. And, again, this is optional for high-level languages. See appendix at the end.

even if it was a very low-level language (ie, WebAssembly... we'd still expose the full API from within the language. Either as unique hardware opcodes or as software API (think DOS Int 0x21).

For something like WebAssembly (or anything else where we aren't designing the VM itself, and hence couldn't add hardware ops) then obviously the API would need to be "software", etc...and from a fantasy perspective whether that software API was all software behind the scenes or if we pretended there were hardware opcodes beneath it, that would be entirely up the other users imagination.

Made up hardware opcodes and software API that somehow runs on fantasy bare metal is what I want to avoid. They add additional points of applying imagination which are superficial and unnecessary. Why have them (and also potential divergence in different languages) if API can be represented as hardware registers which

Can be uniform across all languages TIC supports
Allow low level languages to be implemented without workarounds and pretending to explain them
How it is made IRL

But then you have to build in a whole compiler toolchain, which is whole other thing ...

Compiler chain is not necessary or mandatory if we're talking about user convenience of writing straight in asm. NES games, AFAIK, are written mainly in asm and for lots of other later consoles writing games in asm was a viable choice vs C or other high level languages. Web assembly is sort of aside here, as it wasn't mean to be written by hand initially.

If we're talking about code execution I still don't see a problem here. a) Hypothetic asm language for TIC can actually be interpreted no problemo without converting it to binary. It's just glorified switch-case after all. b) There are already some languages, like Moonscript that are compiled/translated before run, this asm can be like that. c) For extra cool there can be a special path for saving a cartridge that first compiles asm to binary and then saves it.

Anyways those are implementation details. I think this is too early and offtopic here.

some sort of assembly language ...many of the languages do this under the covers already.

Um, true but unrelated. Nobody writes in lua/js/wren (and webasm too, heh) bytecode directly. I guess I have to clarify: "some sort of assembly language that user writes directly in".

About memory ops only API. My knowledge of inner parts of TIC is quite low but it seems like every language must implement all API functions listed in the wiki.

If all those APIs are made as "hardware" then minimal API is only peek and poke functions. Everything else can be implemented on top of them.

For example, equivalent of

tic_api_line(tic, x0, y0, x1, y1, color);

is (constants are made up, not the point here)

tic_api_poke(tic, TIC_GPU_ARGS, x0);
tic_api_poke(tic, TIC_GPU_ARGS + 1, y0);
tic_api_poke(tic, TIC_GPU_ARGS + 2, x1);
tic_api_poke(tic, TIC_GPU_ARGS + 3, y1);
tic_api_poke(tic, TIC_GPU_ARGS + 4, color);
tic_api_poke(tic, TIC_GPU_CMD, TIC_GPU_CMD_LINE);

and tic_api_line implementation can be exactly this sequence of pokes, effectively making it just a convenience wrapper instead of undividible primitive op.

For high level languages nothing changes - they still can use tic_api_line as a convenience wrapper but making peek/poke as the only actually required API enables implementation of low level languages.

joshgoebel commented 2 years ago

This is much less relevant for WASM (which has no issue with magic API calls) - and that seems to be the way we're going (for now) vs fantasy CPUs, so I'm going to close this.