iftechfoundation / ifarchive-if-specs

Specification documents for the Glk, Glulx, and Blorb standards
14 stars 4 forks source link

RFC: Glulx byte swap instructions #13

Open dfoxfranke opened 3 weeks ago

dfoxfranke commented 3 weeks ago

This issue is a proposal to add byte-swap instructions to Glulx. Its purpose for now is just to take the temperature of maintainers. If there's interest in moving forward, I'll follow up with a pull request here to add an appropriate section to the specification, and with PRs to gluxe, quixe, and git to implement it.

This addition is motivated by a project I'm working on to translate WebAssembly into Glulx, making it possible to develop IF in any general-purpose language with a compiler that can target WASM, and have it run seamlessly on existing Glulx interpreters. Currently, an impediment to generating efficient code is that WebAssembly is little-endian while Glulx is big-endian, and swapping requires numerous instructions which would have to be executed before or after every main memory access.

I propose to add six new instructions to Glulx, and one new gestalt ID to indicate their availability.

swap L1 S1

Swap the bytes of L1 and store the result in S1. 0x01020304 becomes 0x04030201.

swaps L1 S1

Swap the two high bytes of L1 with each other, and the two low bytes with each other, and store the result in S1. 0x01020304 becomes 0x02010403.

astoreswap L1 L2 L3

Swap the bytes of L3 and then store it into the 32-bit field at main memory address (L1+4*L2). 0x01020304 is stored as 0x04030201.

aloadswap L1 L2 S3

Load a 32-bit value from main memory address (L1+4*L2), and store it in S1 with its bytes swapped. 0x01020304 is stored as 0x04030201.

astoreswaps L1 L2 L3

Swap the two low bytes of L3 and store them into the 16-bit field at main memory address (L1+2*L2). 0x0102 is stored as 0x0201.

aloadswaps L1 L2 S3

Load an 16-bit value from main memory address (L1+2*L2), and store it in S1 with its bytes swapped. 0x0102 is stored as 0x0201.

swap, aloadswap, and astoreswap will occur frequently in generated code and should ideally have single-byte opcodes. The 16-bit versions are less important and should have two-byte opcodes to conserve numbering space.

DavidKinder commented 3 weeks ago

The decision on this is ultimately up to @erkyrath, but as you asked me to comment (as Git maintainer) ...

First thought: How sure are you that this is sufficient to support a translation of WebAssembly to Glulx? I dimly recall, years ago, working with a DEC Alpha, which in principle could cope with either endianess. My recollection is that that caused all sorts of awkward corner cases. So are these opcodes based on an analysis, or more of a gut feel for what would be needed?

Second thought: Personally I've no objection to new opcodes, but I'm reluctant to support adding them without a real (as opposed to theoretical) use case, especially as they don't add any functionality, they allow a particular use case to run faster. But we already have a mechanism to handle speeding up particular functions that are used a lot: the accelfunc opcode. Currently this is used to re-implement some of the most used Inform 6 veneer functions inside interpreters, but there's no reason why it has to be limited to veneer functions.

Suggestion: If it were me, I would implement the swapping opcodes you want as functions in the Glulx code. Once you've got something that works it would be possible to profile it, and then add support to the main interpreters to replace those functions via the accelfunc opcode, if needed. During development it would be easy enough to add local support for accelerated functions to keep performance up, if that was an issue. That would mean that the result would work (if more slowly) on older interpreters and that interpreter updates would only be needed once you know that the project works.

dfoxfranke commented 3 weeks ago

Using accelfunc is an excellent idea, actually! I completely skimmed over that section while I was reading the Glulx spec because I had the impression that it was Inform-specific and not useful otherwise. Now I see that it's extensible and would fit into this use case quite nicely.

dfoxfranke commented 3 weeks ago

To answer your "first thought" paragraph: I'm confident that these operations (regardless of whether they're implemented as opcodes or through accelfunc) would be sufficient to address endianness issues, because WebAssembly is a load/store architecture which only has a handful of instructions which take memory operands, and those are the only instructions which have any defined endianness, and I've looked at all those and concluded operations would be sufficient to translate them cleanly. Just like in Glulx, in WebAssembly the stack is separate from memory and words on the stack are just words, with no inherent endianness. Comparing to your experience with the Alpha, translating from one VM which is always little-endian to another VM which is always big-endian is an altogether cleaner problem than having one machine that supports both and trying to get modules to interoperate despite different assumptions.

Anyway, I'm leaving this issue open for now, but going the accelfunc route I don't think it needs any action from @erkyrath at this time because the extensions to accelfunc can go into implementations first and the spec later. Accelfunc's behavior of "if the function number isn't recognized, do nothing" means implementers don't have to agree on anything in advance; if different implementations use different numbers for the same function, I can just call accelfunc with all of them, and if it's later standardized I can just add the standard number onto the list.

erkyrath commented 3 weeks ago

I don't have a lot to add here.

There's several ideas floating around for "how to integrate Inform code with other systems". I7's C back end; proposal for a C# I7 back end. I think there was a new I6 back end proposal at some point, although I can't find it now.

Compiling WASM to Glulx is obviously a possible path, but it's not obviously the best possible path or the one that will be successful. Given this, it seems premature to start messing with the spec.

I like the accelfunc approach for this.

In the past, we've also handed out ranges of opcodes for private experimentation. Not one-byte opcodes, though.

dfoxfranke commented 3 weeks ago

There's several ideas floating around for "how to integrate Inform code with other systems". I7's C back end; proposal for a C# I7 back end. I think there was a new I6 back end proposal at some point, although I can't find it now.

I haven't followed these efforts but it sounds like they have a different goal than I do. I'm not trying to translate Inform into other languages or make it interoperate with other languages. I'm trying to create a toolchain for authors who want to develop in something other than Inform, but want to be able to distribute their games as a Glulx or Glulx/Blorb file so that players have a convenient and familiar user experience. I suppose it could also enable you to compile something that Inform can make FFI calls into; that's not a goal of mine, but if it's something that Graham is interested in doing then I'll try to support him. It would require there be some concept of Glulx object files that a linker can combine into a complete story file, but that seems fairly doable and I could probably extend the assembler I just finished writing to support something like that.

In the past, we've also handed out ranges of opcodes for private experimentation. Not one-byte opcodes, though.

I'm sold on the accelfunc approach, so the opcodes won't be necessary, though would you mind giving me an allocation of accelfunc and accelparm numbers? Of course I don't care about operand size for these since they only need to appear once in the program.

curiousdannii commented 3 weeks ago

Are you aware of the glulx-llvm experiment? https://github.com/dfremont/glulx-llvm

It seems more ideal, skipping the whole wasm stage.

DavidKinder commented 3 weeks ago

I'm sold on the accelfunc approach, so the opcodes won't be necessary, though would you mind giving me an allocation of accelfunc and accelparm numbers?

This is very much @erkyrath's choice, but I'd favour only allocating numbers like this in the specification once there's a project out there that is at least somewhat useable by the community. No offence is intended, but opcode ranges have been specified for people before and in general nothing much has been done with them, probably unsurprisingly.

Looking at the non-standard opcode ranges listed in the specification (and wandering off topic, sorry), we have

dfoxfranke commented 3 weeks ago

This is very much @erkyrath's choice, but I'd favour only allocating numbers like this in the specification once there's a project out there that is at least somewhat useable by the community.

I don't need them in the spec immediately, I just need something to use that won't collide with something allocated to someone else later, which can then maybe go into the spec eventually.

dfoxfranke commented 3 weeks ago

Are you aware of the glulx-llvm experiment? https://github.com/dfremont/glulx-llvm

I wasn't, but the approach is one that I considered and rejected. Maintaining an LLVM backend means keeping up with a fast-moving target, because the LLVM IR is constantly growing and many compilers (I'm thinking Rust here, which is what I'm developing in) demand the absolute bleeding edge. WASM changes more slowly, being a web standard that has to support many interoperable implementations. It has a lot of extensions now, but the extensions are mostly things that the compiler won't generate unless you ask for them and give it code that has a specific need for them. It's also easier to begin with, because WASM is already a lot closer to Glulx than LLVM-IR is.

dfoxfranke commented 3 weeks ago

Here's my plan, currently. I've made a list of all the WASM instructions that are going to be translated as function calls into my runtime library. 48 functions are needed; most of them that aren't related to byte-swapping are related to 64-bit arithmetic. That list doesn't include the SIMD instructions, which I'm not going to bother supporting in the first release, but will eventually. Some of these, e.g. the 64-bit bitwise operators, are certainly not going to benefit appreciably from acceleration. But, I plan to specify accelfuncs for all of the scalar functions (not the vector ops, though — there are another 223 of those and that's just ridiculous) and leave it up to terp maintainers how many to bother with. Just supporting swap will probably yield as much benefit as all the rest combined.

I definitely don't advocate cluttering the Glulx spec with all of these. I'm defining the base of the range of accelfunc and accelparm numbers as a compile-time constant: 0x574100 ("WA") for now unless/until Zarf asks me to use something different. When wasm2glulx is released, I'll post a specification for the behavior of these functions at a stable URL. Then I'll submit a PR here just adding one sentence to the spec: "Functions and parameters numbered $RANGE_START through $(($RANGE_START + 255)) are reserved for use by wasm2glulx; they are specified at $URL".