Further Draw optimization ideas

hiiamboris commented 3 years ago

According to @henrikmk Draw can be sped up significantly. Discussion here

Ideas:

Draw block is cached, but allows one to use words to refer to colors, coordinates, blocks, etc. Then on redraw only part referred to by those words is refreshed. Right now I only recall words being used as colors and as image names.
Intermediate forms of computation, e.g. lists of coordinates in OS native format obtained from consecutive pairs, are cached instead of recomputing them every time.
Intermediate, simpler to process, form of draw commands is built from the original block, and cached.
Whole draw process is JIT compiled from given data, eliminating most of the conditionals.

This implies there's some form of tracking which cached item corresponds to what position in a draw block, and changes invalidate/free it and rebuild.

qtxie commented 3 years ago

Draw block is cached, but allows one to use words to refer to colors, coordinates, blocks, etc. Then on redraw only part referred to by those words is refreshed.

This will save the parsing time. The whole time is parsing time + drawing time. If we want to just redraw the changed part, it's very complicated. Simply cache all the draw primitives will blow up the memory quickly.

Intermediate, simpler to process, form of draw commands is built from the original block, and cached.

Smells like Flash.

hiiamboris commented 3 years ago

Sorry for confusion. I don't think that it should redraw only the changed part. It should only refresh those parts of the cache that were affected by the change, or in this scenario - parts referred to by words, but redraw the full queue every time.

Oldes commented 2 years ago

Smells like Flash

Macromedia Flash was able to display nice vector animations in a size of few kB in early 2000s, when we were using mostly 56kbps modems (when we had any connection). Adobe done good job to make it die after their acquisition.

Compiling draw dialect into intermediate representation is a way to go. With possibility to have named input arguments. Being able to cache components in bitmaps would be also useful.

dockimbel commented 2 years ago

JIT-compilation is the way to go there, but such process is notoriously complex to do accurately and efficiently.

Just thinking loud, maybe we could introduce also a system of layers in Draw that would enable to separate between parts that are fixed or can be globally modified (rotation/translating the whole layer) and parts that are dynamically changed on each frame. The former could be simply compiled, while the latter could keep being interpreted.

henrikmk commented 2 years ago

There are some things that might have been poorly interpreted in the ideas in the original post.

It should be noted that R2 DRAW is able to be far faster than Red, because it can do a one-time DRAW block parse, where draw values can be changed alone afterwards, due to being able to bind values to other contexts.

So, if you say:

params: make object! [pos: 10x10 circumference: 20]

draw my-image bind [circle circumference pos] params

Then use my-image somewhere in a View UI.

Then you can change params/pos: 25x25and then do a show, and the image updates without needing reparse.

I don't know how it works internally, but the performance increase speaks for itself.

This feature leads to some thoughts:

There is good reason for separating out draw commands from parameters. When you draw something, 95% of the time, you want change the parameters rather than the shape. This can mean separating out the commands and the parameters in two separate blocks with identical addressing. So, if you want to change the 47th draw command, you can also change the 47th draw parameters.
DRAW blocks take time to build, and then must be parsed. Some DRAW blocks can contain thousands of commands. This leads to direct overhead that could be eliminated by building lists of draw calls directly on the Red/System side to build a queue of draw commands with parameters.
Draw queues do overwrite painting on a bitmap, that's all. That means they can be reused elsewhere, and this is what is meant by having "intermediate" draw commands. Draw a box in the upper left corner of the image. Change the parameters and execute the queue again and you get a new box on the same image. You could make draw queues of multiple sub-draw queues.
Changing thousands of draw parameters at once on a massive scale is a challenge, I haven't quite figured out yet. In R2, you can bind draw parameters to specific locations inside other blocks, and then "sew" draw parameters into blocks that way. That might be the way to go. From the point of view of changing the image, you will poke new values into the right locations in your big block, and R2 will understand it.

This is theoretical, but I'm always trying to see if it's possible to build high performance draw systems without having to resort to tricks, and I think building draw queues is a way to go.

dockimbel commented 2 years ago

It should be noted that R2 DRAW is able to be far faster than Red, because it can do a one-time DRAW block parse, where draw values can be changed alone afterwards, due to being able to bind values to other contexts.

It should be trivial to add word evaluation to Red/Draw interpreter in a local branch and then do a comparative benchmark to see what the gains would really be. My guess is that the gain would not be significative in the general case and only moderately significative in some specific use-cases.

draw my-image bind [circle circumference pos] params

Is draw there referring to the draw command in VID or the draw native funtion?

I don't know how it works internally, but the performance increase speaks for itself.

If you don't know how it works internally, how do you know how much of the gains are caused by the lower number of Draw (re-)parsing?

From our benchmarks on the SVG Tiger animation, which is pretty heavy on Draw commands, the Draw parsing part only slows down the rendering by 12%. We measured it on Android using two versions, one entirely written in Java and one using Red and Draw using Java2D API as backend (same as for the pure Java version). The Red version also suffers from the JNI + bridge overhead in order to access the Java/Android API, so the Draw parsing overhead is in fact even less than 12%.

henrikmk commented 2 years ago

@dockimbel you're spinning and zooming a static drawing, so you can get by with reparsing the same draw block over and over, but you're still going to cap performance to how fast R/S can process that draw block.

If you rebuild the entire draw block from scratch every time and then parse it, which is necessary in Red to do things like graphs, unless you want to sit and poke new values into hundreds of absolute locations, you're going to lose a lot more than 12% performance.

The draw is the native function, but it also works with the draw word in the effect block.

dockimbel commented 2 years ago

The draw is the native function, but it also works with the draw word in the effect block.

In both cases, I don't understand how your code works and how you can draw (pun not intended) such conclusions. Can you provide a fully working (minimal) script for Rebol2?

BTW, I remember already having that conversation on Gitter with you a few years ago and it was not conclusive. So please be clear, accurate and provide working code with measurements.

hiiamboris commented 2 years ago

Just a note: Draw block can include other blocks, so no need to rebuild the whole block every time if only a local part changes (that can be in a separate block).

Oldes commented 2 years ago

I don't have idea how it was done in Rebol2, but in Rebol3 (Atronix version) there was a delect native, which converted the draw dialect block to block of ordered draw commands, which were processed by the draw dispatcher.

I don't consider it to be optimal either.

Oldes commented 2 years ago

@hiiamboris When in Red there is a block inside a draw block, it just recursively parses itself. Or you mean that it could be part of the optimizations?

hiiamboris commented 2 years ago

It was my comment to Henrik's:

If you rebuild the entire draw block from scratch every time and then parse it, which is necessary in Red to do things like graphs

henrikmk commented 2 years ago

OK, I guess I need to retract the statement that it's faster. I cannot reproduce this case in a test program. That's on me.

What I may be seeing in my more complex tests is:

that one parameter can be shared between an arbitrary number of draw commands (change one color parameter changes 10000 point colors of arbitrary, unsorted grouping)
building the draw block produces significant overhead due to calculations needed per command
when changing parameters, you can change an arbitrarily small number of parameters and then redraw

These things are available due to binding, but does not produce faster raw DRAW performance.

So, that means that there is inside R2 draw a constant PARSE overhead, which is the same as Red, which means that draw block parsing is a constant performance cap for both R2 and Red (if you only focus on, say, draw N circles per second).

This makes better sense than my original claim.

But, I would still advocate binding values to draw blocks for the reasons above to ease changing very large draw blocks.

I will then also advocate for draw queues as a method for bypassing parsing performance caps, so you can execute prepared draw calls quickly.

Oldes commented 2 years ago

But Henrik is right, that Red must parse it every time, when you want to redraw it. Now. I really think, that having the draw block converted to commands/arguments would be good optimization.

If I understand Henrik well, and what I would like to see, is, that one make a widget... like a graph with 1000 columns, and than is able to just use a vector with 1000 values to draw it... with optionally setting widget's size and colors.

hiiamboris commented 2 years ago

if line command accepted a block of pairs that would work...

Oldes commented 2 years ago

What if you want to draw columns? Or something more interesting with the same data input?

hiiamboris commented 2 years ago

Then you write a routine that fills the columns commands from that input :)

greggirwin commented 2 years ago

https://gitter.im/red/red?at=61f2ea32d41a5853f9704295 A couple small examples here. Given what @GalenIvanov is doing with animation, what I'd love to see is a small collection of samples we can use for discussion. @henrikmk has pushed list-view limits, and @Oldes knows deeply how animation is used, especially in games. If we have examples we will learn from them, and even get hard numbers on different hardware. It will also let us identify current limits of where Red can be applied, how many draw elements, etc. Henrik's comment on live charts also resonates with me, and having watched some really nice math and other videos done with Unity, I think data viz is a good use case that spans a number of areas. If we can do games like Angry Birds, that's another flag we can plant in the ground.

greggirwin commented 2 years ago

The reason I'll push for code examples is that we can talk around each other a lot, but working code is an important tool we know how to use for communication.

greggirwin commented 2 years ago

I still miss R2's effect pipeline at times. Those grid demos I just posted used effect [grid ..., and I had to cut it out for Red. The combination of that, and gradients, make really nice UI effects blindingly simple. e.g. in altme, or in how extend worked to stretch images for buttons and skins.

pekr commented 2 years ago

@greggirwin IIRC, Effect pipeline was planned for Red too, so isn't it just case of not being implemented ... yet? Or am I wrong?

@Oldes I thought that DELECT pipeline was used for R3 components / plugins, and later on was even scrapped as a concept? I only vaguely remember it being related to gfx pipeline, at least not to VID itself. Not sure about the Draw, but as R3 is open-sourced, we would be possible to find out.

henrikmk commented 2 years ago

If I understand Henrik well, and what I would like to see, is, that one make a widget... like a graph with 1000 columns, and than is able to just use a vector with 1000 values to draw it... with optionally setting widget's size and colors.

Sorry for this long blurb.

What I wanted to see with draw queues is that they serve as drawing subroutines, which can form complex draw primitives with adjustable parameters.

Examples of that:

Textbox with frame
Classic 8 edit knobs around draw item
Flowing text (fixed number of words, but flow changes with resize/font size or style)
Spinning and zooming 3D geometry and point clouds
Point clouds with ID items or using specific symbols
Fixed size grids with changing values, like tables
Complex vector drawings (tiger.svg)
9-segment image drawing, such as for buttons

You could for example build a draw queue for a simple, resizable text box (something that is actually really hard to do properly in R2):

Set text as "foobar"
Translate by 30x30.
Set draw color
Set fill color.
Set line thickness.
Set line pattern.
Find the size of the text "foobar" and store it in text-size
Paint a box using text size and fill color.
Paint an edge around the box using text size, draw color, line thickness and line pattern
Paint the text "foobar" from the text storage.

You could make it more complex with shadows, do cursor tracking, so you could tell where to draw the next item, etc.

Then you execute the queue at maximum speed. Then you keep it around for another use unless you delete the queue.

Making that generic, the parameters could be picked out, so the draw queue only is steps 7 to 10, and use it repetitively.

Set text as "foobar"
Translate by 30x30
Execute text-box-draw
Set text as "foobar2"
Translate by 50x50
Execute text-box-draw

Implemented, it could be:

text-box: make-draw-queue [
  text-size: size-text string
  box text-size ; position and fill color is already set
  frame text-size ; position and draw color is already set
  text string
]

text-boxes: make-draw-queue [
  font: my-font-object ; externally defined font object
  string: "foobar"
  translate 30x30
  execute text-box-draw
  string: "foobar2"
  translate 50x50
  execute text-box-draw
]

final: make-draw-queue [
  fill 0.0.0 ; clears image with black
  fill-color: 40.50.60
  draw-color: 0.0.0
  translate 10x10
  execute text-boxes ; executing sub-queue, which executes 2 other sub-queues
  translate 50x50
  execute text-boxes
]

execute-draw-queue image final

There could also be commands to set which bitmap to paint on, clip regions, copy/paste regions, doing effects like blur, convolve, grayscale, flip or 90-deg rotate on regions, etc. This is all destructive painting on bitmaps. If you want to build a non-destructive, procedural effects pipeline on top of all this, it could be done this way, as it should be very fast.

Loops are unwound, when building the queues, prior to use, but I don't see them being all that big, in the megabytes range or so. The greatest memory issue, I see with complex 3D geometry, which can be gigabyte sized.

Queues could be stopped and return to a previous queue if some parameter is exceeded, such as a text cursor exceeds the bottom right corner, so there is no reason to keep going.

Commands could be omitted, when building the draw queue, if it is known they won't do anything (setting draw color to fully transparent, for example)

You could switch bitmaps inside the draw queue and use multiple bitmaps for compositing without leaving the draw queue.

On a high level DRAW dialect, draw queues could serve as building blocks that are executed per word, or they are direct draw queue commands. You can build your non-destructive effects pipeline this way:

draw image [
  fill black ; direct commands
  pen white
  fill-pen red
  font my-font
  translate 100x100
  text-box "foobar" ; existing draw queue
  region 50x50 bottom-right ; REGION draw queue would figure out how big the image is
  blur 0.2 ; let's ruin the image with some effects
  contrast 50
  region all
  sharpen 0.6
  colorize 50.255.50 ; yuck
]

This would have a performance cap, but you would be free to move performance critical parts into queues. In general, parsed DRAW blocks would be small and simple, where draw queues would be arbitrarily complex.

I have a lot more on the topic, such as Houdini style node-based pipelines, but time to end here.

greggirwin commented 2 years ago

Thanks for the detailed post @henrikmk!

red / REP

Further Draw optimization ideas #103