Closed Daniel-Cortez closed 5 years ago
Would it be possible to have compile time evaluation of __emit
where it is possible?
Instead of __emit dump sym
, what about static const global_arr[] = { __emit const.pri func1, __emit const.pri func2 }
?
The number would be evaluated at compile time and return. If the symbol is global, no instructions are written. If it was a local, maybe output the last constant value with const.pri
(this is what someone would expect to happen?).
Would it be possible to have compile time evaluation of
__emit
where it is possible?
If by "where it is possible" you mean in constant expressions, then yes, I think this should be possible. But functions would be required to be implemented before the point of their use in __emit dump
, otherwise their addresses wouldn't be known.
Instead of
__emit dump sym
, what aboutstatic const global_arr[] = { __emit const.pri func1, __emit const.pri func2 }
? The number would be evaluated at compile time and return. If the symbol is global, no instructions are written. If it was a local, maybe output the last constant value withconst.pri
(this is what someone would expect to happen?).
__emit const.pri
would imply the value being loaded into the primary register and a const.pri
instruction being put into the data section, both would be misleading in constant expressions. Pretty much the same problem as with dump
.
Also making const.pri
serving two purposes would be too complicated IMO, which is why I suggested dump
instead, because this name is not used by any of __emit
instructions (initially I was towards __emit const <sym>
, but unfortunately this name is too already taken by one of macroinstructions introduced in Pawn 3.2).
I'dmuch prefer this as something like addressof(x)
(or rather __addressof
). An actual keyword instead of extending an unrelated feature in a strange way (I've written an addressof
library, but obviously an intrinsic is better.
Actually, there is this, #348, and possibly other required meta data like specifierof(x)
etc, all of which would be nice to have, but would all clutter the global namespace (even with proper __
prefix). Maybe a more generic solution like:
#pragma meta name MyFunc
Which can then be combined with __Pragma
from #267, then we have an easily extendable solution of:
#define nameof(%0) __Pragma meta name %0
Yes, the original becomes a bit more verbose, but less intrusive and definable. Or, since pragmas are instructions:
__Meta(name, func)
__Meta(address, func)
#if __Meta(address)
#define addressof(%0) __Meta(address, %0)
#endif
Or something else, I don't like the symbols used for "name" and "address" in that code, but it's a rough idea.
While addressof
seems to be a good idea, this question should probably be put aside for the time being as we need a consensus on the naming scheme (and if addressof
should even be a separate operator).
Any help with phrasing a proper error description instead of invalid range
still would be appreciated. I'll re-quote the question for convenience:
- Add a new error for invalid IDs in lctrl, sctrl and other instructions.
__emit lctrl -1;
Currently this would make the compiler issue an "invalid range" error, which clearly isn't the right description. But again, I'm not sure how to phrase that error properly.
There is error 50 - "invalid range", 32 - "array index is out of bounds", and 28 - "invalid subscript (not an array or too many subscripts)". None are quite right, but good examples, giving something like "parameter is out of bounds". Would also need a way to specify lctrl 7 and 8 as valid (for JIT and crashdetect), maybe #pragma lctrl 7
.
There is error 50 - "invalid range", 32 - "array index is out of bounds", and 28 - "invalid subscript (not an array or too many subscripts)". None are quite right, but good examples, giving something like "parameter is out of bounds".
That's my whole point. None of the existing errors are quite correct for this situation, so we probably need to add a new one. Speaking of which, how about ID %d is invalid for instruction \"%s\"
?
Would also need a way to specify lctrl 7 and 8 as valid (for JIT and crashdetect), maybe #pragma lctrl 7.
That's already been taken care of, __emit
recognizes IDs 0 through 9 as valid for lctrl
(and for sctrl
the valid IDs are 2, 4, 5, 6, 8 and 9). Or you mean we should rather disable IDs 7, 8 and 9 by default and enable them from user code with something like #pragma lctrl 7 on
?
"Parameter" instead of "ID" to be more generic to any instruction, but otherwise yes.
And my point was more that plugins can add more. Yes 7-9 are covered, but there could be more in the future that should be specifiable.
Since lctrl 123
is a no-op why check the parameter anyway? could as well just leave it be. Maybe some guy will invent a new meaning for lctrl 100
in some plugin. BTW crashdetect uses lctrl 255
.
"Parameter" instead of "ID" to be more generic to any instruction, but otherwise yes.
I wanted to use "parameter" at first too, but it sounds too generic, it can mean anything (data address, stack offset, function address, label) while the error in question is supposed to be issued only for instructions lctrl
, sctrl
, lodb.i
, strb.i
, idxaddr.b
, lidx.b
, shl.c.pri/alt
and shr.c.pri/alt
, all of which take integer values as arguments. Maybe something more specific like "numeric parameter" would fit better (though it would probably also imply rational numbers)?
And my point was more that plugins can add more. Yes 7-9 are covered, but there could be more in the future that should be specifiable.
Since lctrl 123 is a no-op why check the parameter anyway? could as well just leave it be. Maybe some guy will invent a new meaning for lctrl 100 in some plugin. BTW crashdetect uses lctrl 255.
Probably would be better to keep everything simple and just disable that check for lctrl
and sctrl
(the only use of it was to make sure the user didn't specify an invalid ID due to a typo or something else) than to clutter the compiler with another #pragma
.
It is only used for those instructions currently - with "parameter" it could be applied to others as well. Or if not parameter, not ID either - they aren't identifiers in most of those cases. I suppose the correct term is "operand".
It is only used for those instructions currently - with "parameter" it could be applied to others as well.
Not sure what you mean. There's no need to apply this error to other instructions as they already have this:
new var;
emit stor.pri var; // error 001: expected token: "-data offset-", but found "-local variable-"
which is much more informative IMO, as it also tells about operand types.
Oh, ok.
In retrospect I can see the use of having some sort of address psudo-op, assuming that it can do locals and arrays as well. Currently there's no good way to do this in inline assembly:
stock _YSI_ConstMod(const &var, const val)
{
#pragma unused var, val
#emit LOAD.S.pri val
#emit SREF.S.pri var
}
The compiler generates all the correct code for calling the function regardless of the type of var
:
_YSI_ConstMod(localVar, 5);
_YSI_ConstMod(globalVar, 6);
_YSI_ConstMod(array[5][6], 7);
All those require different addressing OPs - PUSH.C
, PUSH.adr
, and just way more code for the array. There's just no way to convert that to a macro without:
#define _YSI_ConstMod(%0,%1) \
__emit PSUDO.ADDR.alt %0 \
__emit CONST.alt %1 \
__emit STOR.I
Kinda funny seeing that such "pseudo"-instructions could be useful, since I already experimented with a similar idea last year, but left it since I thought those instructions would never be useful to anyone.
The idea about pseudo-instructions as a whole is not really new - you can already use __emit
with macro-instructions in SA-MP scripts, without -O2
, and such instructions will compile into combinations of non-macro ones!
e.g.
__emit sysreq.n printf 16
=> push.c 16 \ sysreq.c printf \ stack 20
__emit load.s.both varA varB
=> load.s.pri varA \ load.s.alt varB
This is one of the features that was implemented in #211.
And, as I already said, after implementing that feature I experimented with the idea of pseudo-instructions which could compile into different instructions depending on the types of their arguments, and ended up implementing two instructions called "load.u.pri/alt" and "stor.u.pri/alt" (the "u" stands for "unified"; I'm not a native English speaker though, so I'm pretty sure there might be better fitting alternatives to that word).
I haven't seen the feature with macro-to-non-macro instructions I mentioned earlier being used anywhere, so I figured pseudo-instructions wouldn't be of much use as well and left them in a local repo with only the two abovementioned instructions implemented. But if such instructions could really be useful, I can always pick up the old implementation and finish it.
Here's a full list of instructions I was going to implement.
pseudo-instruction | data type | clobbers PRI/ALT | compiles into |
---|---|---|---|
load.u.pri/alt |
data address | - | load.pri/alt <address> |
stack offset | - | load.s.pri/alt <offset> |
|
constant value | - | const.pri/alt <value> |
|
reference | - | lref.s.pri/alt <refvar> |
|
stor.u.pri/alt |
data address | - | stor.pri/alt <address> |
stack offset | - | stor.s.pri/alt <offset> |
|
reference | - | sref.s.pri/alt <refvar> |
|
addr.u.pri/alt |
data address | - | const.pri/alt <address> |
stack offset | - | addr.pri/alt <offset> |
|
reference | - | load.s.pri/alt <refvar> |
|
push.u |
data address | - | push <address> |
stack offset | - | push.s <offset> |
|
constant value | - | push.c <value> |
|
reference | PRI | lref.s.pri <refvar> \ push.pri |
|
push.adr.u |
data address | - | push.c <address> |
stack offset | - | push.adr <offset> |
|
reference | - | push.s <refvar> |
|
zero.u |
data address | - | zero <address> |
stack offset | - | zero.s <offset> |
|
reference | PRI | zero.pri \ sref.s.pri <refvar> |
|
inc.u |
data address | - | inc <address> |
stack offset | - | inc.s <offset> |
|
reference | PRI | load.s.pri \ inc.i |
|
dec.u |
data address | - | dec <address> |
stack offset | - | dec.s <offset> |
|
reference | PRI | load.s.pri \ dec.i |
@Y-Less The arguments of type "array" are not listed in the table above since I don't know how to handle them exactly for the following reasons:
It would require extra effort because of indexing. Are there any parts of code within the compiler I could reuse for this task?
More extra effort and more generated instructions to save/restore registers (otherwise PRI and/or ALT would get clobbered after some of the pseudo-instructions).
You can already see how complicated the generated code gets when handling pointers (instructions push.u
and push.adr.u
).
Should we add index checks, as in regular non-__emit
code (opcode bounds
)?
I see what you're trying to do, but I'm not convinced by the excessive code involved in the "reference" variants:
1) Half the code is undefined - relying on unallocated memory like here: push.alt \ push.alt \ stack <cellsize * 2> \ lref.s.alt <refvar> \ push.alt \ stack <-cellsize> \ pop.alt
. For most of that code the storage for alt
is out of the defined stack scope, so invalid memory. This particular example can be done much more simply anyway with push.alt \ lref.s.alt <refvar> \ swap.alt
, but that's beside the point.
2) They would be simpler if they just noted that a register might be clobbered - that's what registers are for. A chain of 4 push.u
in a row will waste a lot of time saving and restoring alt
for maybe no good reason. a) it could be done much more simply once by the user at the start of the code, and b) might not even be needed if they know they won't reuse the current value of alt
. It is far better to say "may clobber alt
" and let the user deal with that.
I'm not sure where the compiler code for generating array access is, but if there's an easily reusable section, I'd say use that and let it add BOUNDS
or not, depending on debug levels.
For most of that code the storage for alt is out of the defined stack scope, so invalid memory.
Aren't you mistaking stack for heap? You don't have to pre-allocate space in the stack to push values there.
Or you mean that push*
instructions don't check for stack/heap collision? If so, there's a 16-cell sized stack/heap margin (see the STKMARGIN
macro in amx.c), so I'm pretty sure saving registers or temporary values to the stack should be alright as long as we don't save more than 16 values.
This particular example can be done much more simply anyway with push.alt \ lref.s.alt
\ swap.alt
As I already said, I only implemented load.u.pri/alt
and stor.u.pri/alt
(which were probably the most simple to do), and didn't spend a lot of time on thinking how to optimize generated code for the other instructions. But thanks anyway, I'll correct the code in the table above for this and the other similar cases.
They would be simpler if they just noted that a register might be clobbered - that's what registers are for.
Ok then. Too bad we don't have the source for the documentation (pawn-lang.pdf, pawn-imp.pdf) to add new info to the opcode table. For now I'll add notes to the table in a post above under instructions that modify the registers and remove the code for register saving/restoring.
No, I mean:
push.alt // stack now -4
push.alt // stack now -8
stack <cellsize * 2> // stack now -0, data just pushed invalidated
lref.s.alt <refvar> // irrelevant
push.alt // stack now -4
stack <-cellsize> // stack again -8, but no guarantee of contents
pop.alt // popping garbage memory
data just pushed invalidated no guarantee of contents
... Where exactly does this come from? Pawn Implementer's Guide clearly states that "the heap and the stack share a memory block", so the memory between the heap and the stack won't go anywhere.
But the memory between the stack and the heap is not part of either the stack or the heap - it is garbage memory. I know for a fact that the JIT implementation stores data in there while calling natives at least, because it has no reason not to.
Is this a BUG REPORT, FEATURE REQUEST or QUESTION?:
What happened: I have a few ideas for the
__emit
operator, but before implementing them and making a PR I guess it would be better to discuss them here first.Add
__emit dump <identifier>
. This would allow the compiler to retrieve addresses of functions, variables etc. as constant values, which could be useful for initialization of constants and immutable arrays. Example:Though I'm not sure if
dump
would be a proper pseudo-instruction name because in compiler assembly outputdump
means "put value(s) into the data section", not "retrieve symbol value/address".Allow jumps to naked functions. Since the next release is going to have naked functions without prologue and epilogue code, maybe we should allow such functions as arguments for jump instructions?
Issue an error/warning if data/stack offset is not a multiple of cell size.
This would require addition of new error in
sc5.c
though (would "not a multiple of cell size" be a good description for it?)Add a new error for invalid IDs in
lctrl
,sctrl
and other instructions.Currently this would make the compiler issue an "invalid range" error, which clearly isn't the right description. But again, I'm not sure how to phrase that error properly.
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: Special thanks to @VVWVV for giving me the idea about the "not a multiple of cell size" error.
Environment: