Open kervinck opened 5 years ago
==========================
Part 1: Consistency issues
==========================
Naming is a concept that is orthogonal to that of variables and
constants. gcl0x notation doesn't fully reflect this, and it is
inconsistent in some ways because of that. It got somewhat confused
because there are two namespaces at play: that of automatically
allocated user variables, and that of system defined constants
(some but not all may then refer to system variables).
As a result, there are shortcomings to gcl0. For example there is
no way to PEEK from a fixed known address, and accessing system
variables looks ugly in general.
But it can be improved. If we ignore byte-sized entities at first, we
then can have the following syntax scheme for the basic operations:
+-----------------------------------------------+-----------------------+
| GCL notation | vCPU compilation |
+-----------------------+-----------------------+-----------------------+
| Named variables | Named constants | |
+-----------------------+-----------------------+-----------------------+
| old new | old new | |
| Load X X | Load \C; .C | LDW $DD |
| Store X= X= | Store \C: .C= | STW $DD |
| Read X; X; | Read .C; | LDI $DD + DEEK |
| Write X: X: | Write .C: | DOKE $DD |
| Address &X | Value \C \C *| LDI $DD (or LDWI*) | *See also table 3
+-----------------------+-----------------------+-----------------------+
| Unnamed variables | Unnamed constants | |
+-----------------------+-----------------------+-----------------------+
| old new | old new | |
| Load i; *i | Load | LDW $DD |
| Store i: *i= | Store | STW $DD |
| Read *i; | Read | LDI $DD + DEEK |
| Write *i: | Write | DOKE $DD |
| Address i i | Value i ii i ii | LDI $DD (or LDWI*) | *See also table 3
+-----------------------+-----------------------+-----------------------+
Table 1. Consistency improvements for word operations
Where:
X is a GCL variable's name, automatically allocated in the zero page
i is a small integer constant, eg. 123 or $30
C is a symbol representing a constant on the host platform.
Typically predefined in interface.json when compiling with
compilegcl.py, or set in preceding assembly code with
define() or label() when compiling from ROMvX.py. Typical examples
are `vLR', `rawSerial', `screenMemory' `fontData', `buttonUp'.
(This is how GCL programs link to application-specific SYS functions.)
Change summary
--------------
.C Treats symbol C as a GCL variable with address taken from the symbol table
All operations possible on normal variables are possible (e.g. `.vLR=')
\C Unchanged, but now `\C;' and `\C:' have a nicer alternative syntax
*i Treats a small integer as a GCL variable with zero page address i
&X Takes the address of the automatically assigned GCL variable X.
Added for completeness. Not even sure if we've needed it already.
Code example
------------
Old: [if<=0 do \frameCount; #\DOKE #\vPCH loop]
New: [if<=0 do .frameCount .vPCH: loop]
===============
Byte operations
===============
GCL variables are always word sized, while many system variables
are single byte, for example `rawSerial' and `romType'.
We can already address the individual bytes of named variables
with prefixes `<' and `>', but again the notation is ugly:
X<, Get low byte of X
X<. Set low byte of X
X<++, Increment byte of X
Only peek and poke look good (`X,' and `X.') but there's no
notation for using an unnamed variable for these. This leads to
cases of inline assembly:
$7400 [do \POKE# \vAC# \vAC>++ if>0loop] {Racer_v1.gcl}
With the prefix operators it is only mildly better:
<X. >X.
<X++ >X++
The instructions of concern are LD, ST, INC, PEEK and POKE (and LDI).
Lets make a table as we did above for the word operations.
+-----------------------------------------------+-----------------------+
| GCL notation | vCPU compilation |
+-----------------------+-----------------------+-----------------------+
| Named variables | Named constants | |
+-----------------------+-----------------------+-----------------------+
| old new | old new | |
| Load X<, <X | Load \C, <.C | LD $DD |
| X>, >X | >.C | LD $DD+1 |
| Store X<. <X= | Store \C. <.C= | ST $DD |
| X>, >X= | >.C= | ST $DD |
| Increment X<++ <X++ | Increment \C<++ <.C++ | INC $DD |
| X>++ >X++ | \C>++ >.C++ | INC $DD+1 |
| Read X, X, | Read .C, | LDI $DD + PEEK |
| Write X. X. | Write .C. | POKE $DD |
| Address &<X | Value <\C | LDI $DD |
| &>X | >\C | LDI $DD |
+-----------------------+-----------------------+-----------------------+
| Unnamed variables | Unnamed constants | |
+-----------------------+-----------------------+-----------------------+
| old new | old new | |
| Load i, <*i | Load | LD $DD |
| >*i | | LD $DD+1 |
| Store i. <*i= | Store | ST $DD |
| >*i= | | ST $DD+1 |
| Increment i<++ <*i++ | Increment | INC $DD |
| i>++ >*i++ | | INC $DD+1 |
| Read *i, | Read | LDI $DD + PEEK |
| Write *i. | Write | POKE $DD |
| Address <i <ii| Value <i <ii| LDI $DD |
| >i >ii| >i >ii| LDI $DD |
+-----------------------+-----------------------+-----------------------+
Table 2. Consistency improvements for byte operations
Our Racer example then becomes
Old: $7400 [do \POKE# \vAC# \vAC>++ if>0loop]
New: $7400 [do .vAC. >.vAC++ if>0loop]
===============
Stack variables
===============
With prefix notation we can improve the LDLW/STLW operations:
old new
--- ---
i% %i Get variable at stack offset i LDLW $DD
i%= %i= Set variable at stack offset i STLW $DD
This hints better that '%i' is the name of a variable we can get and set.
For ALLOC, `i++' and `i--' are plain confusing. Without much prior
knowledge and coming from a C background, one expects it modifies
vAC. But it modifies vSP instead. The notation must therefore also
improve. We have a choice between two concepts:
Option A
old new
--- ---
i++ i%+ Add i to vSP ALLOC $DD
i-- i%- Subtract i from vSP ALLOC -$DD
Option B
old new
--- ---
i++ %i+ Add i to vSP ALLOC $DD
i-- %i- Subtract i from vSP ALLOC -$DD
I don't like option B, because we're not "adding" the stack variable `%i'.
======================================
Part 2. Constant and label definitions
======================================
Now we can use the same notation to define labels and use them.
In the above terminology these are nothing but named constants.
First I suggest to retire the `$300:' notation for setting the
compilation address, because in gcl0x the meaning of the postfix
colon depends on the magnitude... With prefix notation we have a
perfectly readable alternative:
*=$300
This then leads to the folloing notation for defining labels:
label=*
Or defining other constants:
Blue=$20
We can use the normal operators with such constants:
indent=2
Pos \indent+ Pos= { Same as: Pos 2+ Pos= }
We can even define our own zero page variables and bypass the
automatic allocation:
V=$81
1 .V=
==============
Implementation
==============
If constants or labels are defined later than used, a two-pass
compilation approach is needed. However, gcl.py is single-pass and
for simplicity we really like to keep it that way. Fortunately,
asm.py is already doing something like this with its own symbol
table. The only limitation is that its mechanism only works for
retro-fitting byte values, not for arbitrary word values. In GCL
we can still make use the same mechanism if we tell it if we want
a byte or a word.
As a notation, I propose we use single-\ for byte values, and double-\
for word constants. This is the most in line with previous usage:
\C for forcing a LDI instruction
\\C for forcing a LDWI instruction
In code generation the difference is between
emitOp('LDI'); emit(lo('C'))
and
emitOp('LDWI'); emit(lo('C')); emit(hi('C')).
However, in the current implementation of asm.py the following
will NOT give a warning:
\C { ...code... } C=ii
This is dangerous, because gcl0x automatically selects the correct
instruction sequence when we write `\C'. It therefore won't silently
lose its value. It's not too easy to add '\\' to gcl0x, so we need
a slightly different approach: let the assembler do the range check.
It currently has _refsL[] and _refsH[] lists that end() uses to
pick the desired word half. We can add a third list that behaves
as _refsL[] but fails for out of range values: lets call it _refsB[]
and define:
def byte(name):
_refsB.append((name, _romSize))
return 0 # placeholder
We then get something like this:
GCL Generation
--- ----------
\C emitOp('LDI'); emit(byte('C'))
\\C emitOp('LDWI'); emit(lo('C')); emit(hi('C'))
<C emitOp('LDI'); emit(lo('C'))
>C emitOp('LDI'); emit(hi('C'))
+-----------------------+-----------------------+
| GCL notation | vCPU compilation |
+-----------------------+-----------------------+
| Named constants | |
+-----------------------+-----------------------+
| old new | |
| Value \C \C | LDI $DD |
| \\C | LDWI $DDDD |
+-----------------------+-----------------------+
| Unnamed constants | |
+-----------------------+-----------------------+
| old new | |
| Value i i | LDI $DD |
| ii ii | LDWI $DDDD |
+-----------------------+-----------------------+
Table 3. Consistency improvements for byte operations
Edit 2019-06-21:
\\C for forcing a LDWI instruction
I figure the \\-notation isn't necessary. We should simply always emit LDWI
for labels that are still unresolved. Therefore we can always keep typing single-\ as before.
==============
Migration path
==============
gcl0x: Add warnings for notations that will be removed or change in gcl1
Accept the new notations already, to help migration
Warn i= -> i:
Warn \ii -> \\ii
Warn ii: -> *=ii
Support #i `text
Support <X++ >X++
Support <X, >X.
Support \\ii
gcl1: Make incompatible changes
Remove i=
Remove ii:
Remove \ii
Add warnings for old notations that have a nicer new alternative
Warn i# -> #i or `text
Warn X<++ -> <X++
Warn X>++ -> >X++
Warn X<, -> <X,
Warn X>. -> >X.
gclN: Hypothetical future version
Remove i#
Remove X<++ X>++
Remove X<, X>.
Labels now supported with this commit: https://github.com/kervinck/gigatron-rom/commit/734791508034b42cdb5bede33edfd4610310d367
I plan another change:
The SYS call operator should change from i!
to i!!
, with i
still the maximum number of needed cycles. For example 134!!
when calling _SYS_VDrawBits134. The reason is the new CALLI operator needs a notation, and i!
is by far the most logical (or actually ii!
for an immediate call to address ii
).
This way !
refers to RAM calls (e.g. F!
and $2600!
), and !!
refers to ROM calls. Different address spaces, different instruction set, different meaning of the operand.
In the transition period, the compiler can emit SYS for i<256 and not complain. With gcl1 this usage should be flagged as deprecated.
From Docs/GCL-language.txt:
We foresee three versions of GCL:
gcl0x
,gcl1
andgcl2
.
gcl0x
is what we used to make the built-in applications of ROM v1. It is still evolving, sometimes in backward incompatible ways.
gcl1
will be the final update in notation once we've settled on what GCL should really look like. gcl0x has some inconsistencies in notation that are confusing. Some aren't easy to resolve while maintaining its spirit. We won't take this step easily.
gcl2
will add a macro system. The parenthesis are reserved for that.