Improve performance of stack type checking in opexec

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. execution of operator objects
2.
3. algorithmic improvement -> possible order-of-magnitude speedup

What is the expected output? What do you see instead?

Operator objects are executed by a series of simple steps,

  * get pointer to OPTAB, the operator table
  * get struct oper object from optab
  * get signature block (a character string of type* ids)
  * iterate through signatures
    * check signature against stack
    * copy args from stack to holding-stack
    * call op-function associated with *this* matched signature
      accessing hold-stack contents via pointers

The "check signature against stack" is the problem here (and it was pieces of 
this "optimization" that led to the unnecessary UINT128 which isn't portable). 
*The type ids in the signature are extended with "pattern types": anytype which 
matches any object (count has already been checked previously), numbertype 
which matches either int or real, floattype which matches reals and promotes 
ints to reals, and proctype which is shorthand for executable array. Now the 
executable attribute is already in the tag field where the type is stored, but 
to pack these signature blocks tightly, these are stored as characters and 
masked to just the bits of the type sub-field, so the extended-type-id patterns 
do not clash with the flags which may use the higher bits. Hm... That detail is 
not particularly important here, but is for descriptive purposes. :)

The current implementation is what I usually call "the nasty loop". It's in 
xpost_operator.c:opexec()//

  * for each signature
    * if stack_count < sig.args
       - err = stackunderflow, next signature
    * for each type-pattern in sig
       * if anytype, 
           next type-pattern
       * if type-pattern matches the type of the stack object in that position,
           next type-pattern
       * if pattern is numbertype and corresponding stack object is int or float,
           next type-pattern
       * if pattern is floattype
           * if stack object is int
              promote to float
           * if stack object is real
              next type-pattern
       * if pattern is proctype
             and stack object is array
             and stack object is executable
              next type-pattern
       * else
           err = typecheck
           next signature

What I'd really like to do is match this type object as a large integer with 
bitwise logic. 8 bytes is 64bits, or UINT64 or unsigned long long. So the 
program logic could be simpler:

    * take "digest" of stack types
    * for each signature
        * compare type-pattern-block word-wise against digest
        * if good, 
           call op-function and return success
    * return error

The word-wise type comparison then becomes a nasty logic puzzle, but look how 
clean the code becomes!!! It's sooo pretty, it's *got* to be fast. Look at how 
many conditional branches are eliminated! That's got to zip through the 
pipeline faster, right? right? amiright??!

Please use labels and text to provide additional information.

Original issue reported on code.google.com by luser.droog on 28 Feb 2014 at 8:27

GoogleCodeExporter commented 9 years ago

The way I understand pipelining, we can essentially do several simple binary 
operations "for free" in the comparison. That is, relative to the stalls 
eliminated by removing the conditional jumps.

So we've got an enum of object types

    enum { invalid, null, mark, integer, real, array, dict, file, operator, save,
           name, Boolean, context, extended, glob, magic, string };

And we want to be able to match these and also easily match 

    anything not "invalid"  (ie. non-zero)
    integer or real  (:: numbertype)
    convertible to real  (:: floattype)
    executable and arraytype  (:: proctype)

One idea is to push the task of convertion back onto the operator functions and 
just worry about the numbertype. If these are placed on a bit-boundary, then we 
can mask the two bits, mask+and & ~mask+xor maybe. Hmm... or we could push that 
into the signature loop by installing multiple sigs for each specific type 
(even with the same op-function), then we don't need a numbertype pattern at 
all. ... Hm....

For proctype matches, we need the executable flag to still be present, which 
means we need to explicitly mask this off for other patterns which aren't 
interested. So the digest would have bytes containing the type + executable 
flag.

Let's see. Currently we have:

    XPOST_OBJECT_TAG_DATA_TYPE_MASK          = 0x001F, /**< mask to yield Xpost_Object_Type */
    XPOST_OBJECT_TAG_DATA_FLAG_VALID_OFFSET = 5, /* first flag, bit offset to a point above the type mask */ 
    XPOST_OBJECT_TAG_DATA_FLAG_ACCESS_OFFSET,    /**< bitwise offset of the ACCESS field */
    XPOST_OBJECT_TAG_DATA_FLAG_LIT_OFFSET =
        XPOST_OBJECT_TAG_DATA_FLAG_ACCESS_OFFSET + 2, /* access is a 2-bit field, lit must make room */

Hm.. FLAG_VALID is another artifact of trying to do this.

So we've got 5 bits of type identifier and LIT is in bit 7, right on the edge 
of the byte. We can make a digest with 2 loops since the stack is segmented and 
our arguments may span a segment boundary.

    * for top-most segment of upto 8 args from stack
       * get object tag
       * mask executable-flag and type-field  ( tag & 0x9F )
       * shift and or-in to digest word
    * for remainder of 8 args on second-topmost segment
       * ^same body as above^

Original comment by luser.droog on 28 Feb 2014 at 8:48

GoogleCodeExporter commented 9 years ago

Previous discussion:
https://groups.google.com/d/topic/comp.lang.postscript/5jljhIT0-lY/discussion

Original comment by luser.droog on 28 Feb 2014 at 8:53

GoogleCodeExporter commented 9 years ago

Hm. Maybe we don't need the mask when creating the digest. We can store a 
specific mask value with the signature. This way the signature's mask can 
specify whether we're interested in the executable flag.

Hm. Since the BANK Flag (which indicates whether a composite object's data lies 
in local memory or global memory) is in the first byte, we could implement 
specialized get and put operators for global objects which bypass the 
Local-in-Global access violation (PS /invalidaccess error). .... Doing this, we 
could remove the global variable "ignoreinvalidaccess" and call non-checking 
op-functions directly in the short initialization function that needs to bypass 
this error.

Original comment by luser.droog on 28 Feb 2014 at 9:26

GoogleCodeExporter commented 9 years ago

For the VALID Flag. This should be set by all object constructors, and a 
type-id byte of zero would obvious not have the flag set in the type-digest. So 
a stackunderflow can also be detected from the digest.

Original comment by luser.droog on 28 Feb 2014 at 9:29

GoogleCodeExporter commented 9 years ago

This page presents the original "prototype" code for the current version of the 
type-checking. It is perhaps easier to read than the current version of the 
code.

https://groups.google.com/forum/#!topic/comp.lang.postscript/WFRrChvXCpU/discuss
ion

Original comment by luser.droog on 1 Mar 2014 at 9:05

GoogleCodeExporter commented 9 years ago

Oh. I counted wrong above, the ACCESS bitfield is 2 bits, 6-7, so the LIT flag 
is bit 8!. But, we don't really need a VALID flag since we have an 
"invalidtype" object.
So, we can shift all the flags down, and Double the MAX_ENT number!

Original comment by luser.droog on 1 Mar 2014 at 9:13

GoogleCodeExporter commented 9 years ago

Added hard-coded stack-checking functions for common simple patterns like 

int
bool
real
float
any
int int
bool bool
real real
float float
any any

which covers math, comparisons, moveto and lineto. So even though this solution 
isn't really documented here, it is what came out of thinking over all of this. 
So, despite any possible appearances to the contrary, this issue truly is 
resolved, until/until further testing suggests otherwise.

Original comment by luser.droog on 15 Sep 2014 at 10:06

Changed state: Fixed

luser-dr00g / xpost

Improve performance of stack type checking in opexec #27