wiz-lang / wiz

A high-level assembly language for writing homebrew software and games on retro console platforms.
http://wiz-lang.org/
Other
409 stars 40 forks source link

Figure out a nicer syntax for unaligned data access. #82

Closed Bananattack closed 3 years ago

Bananattack commented 4 years ago

Right now it is cumbersome to access smaller views of larger data-types directly. It is also annoying to use unaligned indexing modes. Part of this is because of how indexing is defined, and attempting to keep things type-safe. The indexing operator in Wiz (used by arrays and pointers) is defined in the familiar C-style, which treats indexing as syntactic sugar for pointer arithmetic, and where an index is considered sizeof(T) bytes apart from its previous element. This is useful because it means if you have a array of some type, you can't accidentally index between elements (but sometimes if you know what you're doing, you may want that)

To get an unaligned access to pass the type-checker, you can specify an expression that will cast the addresses into integers, perform addition, cast the expression back, and then indirect it, but it's ugly and cumbersome. In a higher-level language, or a more modern CPU architecture where everything is aligned / has a penalty for unaligned access, this might be a desirable thing to make this difficult to do in the first place.

However, the problem is many older CPU architectures that Wiz targets only support unaligned indexing, even when using 16-bit reads/writes, such as the situation on the 65816, where x and y are always byte offsets, making the responsibility for aligning the indices fall on the programmer. Also on 6502, unaligned indexing shows up when using the indexed-by-x-indirect mode, where the indirected address is 16-bit, but x is a byte offset from the address you specify.

The various byte-access operators in Wiz (<:, >:, and #:) solve one particular case of the small-piece-of-bigger-data problem, by simplifying the casts and arithmetic involved to get a specific byte of a value that has a low-byte, high-byte, or bank-byte. However, the same issue comes up for wanting to access only a 16-bit quantity of a 24-bit value, or any other number of cases of "smaller sub-element that is part of bigger element".

It would be nice to come up with some other subscripting operator that can denote that the index is in bytes, even though the size accessed is in some larger-than-byte type (eg. u16, u24, u32, pointer to u8, etc). Allow this subscript to work for both constant values (for retrieving sub-views of bigger data, which can either be a scalar value or an array of values) and for runtime values (for unaligned indexing modes).

Bananattack commented 4 years ago

One thought I had was maybe something like an unaligned keyword that could be used inside the subscript part of an indexing expression, and you'd use it like this:

(Assuming 6502 code)

var pointer_table : [*u8; 16]; // a table of 16 pointers
var far_ptr : far *u8;

// ...

// Call out the fact that x is a unaligned byte offset, and not a pointer-size-aligned offset.
a = *(pointer_table[unaligned x]);

this doesn't really solve the issue of addressing a 16-bit quantity of a 24-bit value, but it solves unaligned access for arrays and pointers.