red / REP

Red Enhancement Process
BSD 3-Clause "New" or "Revised" License
10 stars 4 forks source link

Further Draw optimization ideas #103

Open hiiamboris opened 3 years ago

hiiamboris commented 3 years ago

According to @henrikmk Draw can be sped up significantly. Discussion here

Ideas:

  1. Draw block is cached, but allows one to use words to refer to colors, coordinates, blocks, etc. Then on redraw only part referred to by those words is refreshed. Right now I only recall words being used as colors and as image names.
  2. Intermediate forms of computation, e.g. lists of coordinates in OS native format obtained from consecutive pairs, are cached instead of recomputing them every time.
  3. Intermediate, simpler to process, form of draw commands is built from the original block, and cached.
  4. Whole draw process is JIT compiled from given data, eliminating most of the conditionals.

This implies there's some form of tracking which cached item corresponds to what position in a draw block, and changes invalidate/free it and rebuild.

qtxie commented 3 years ago

Draw block is cached, but allows one to use words to refer to colors, coordinates, blocks, etc. Then on redraw only part referred to by those words is refreshed.

This will save the parsing time. The whole time is parsing time + drawing time. If we want to just redraw the changed part, it's very complicated. Simply cache all the draw primitives will blow up the memory quickly.

Intermediate, simpler to process, form of draw commands is built from the original block, and cached.

Smells like Flash.

hiiamboris commented 3 years ago

Sorry for confusion. I don't think that it should redraw only the changed part. It should only refresh those parts of the cache that were affected by the change, or in this scenario - parts referred to by words, but redraw the full queue every time.

Oldes commented 2 years ago

Smells like Flash

Macromedia Flash was able to display nice vector animations in a size of few kB in early 2000s, when we were using mostly 56kbps modems (when we had any connection). Adobe done good job to make it die after their acquisition.

Compiling draw dialect into intermediate representation is a way to go. With possibility to have named input arguments. Being able to cache components in bitmaps would be also useful.

dockimbel commented 2 years ago

JIT-compilation is the way to go there, but such process is notoriously complex to do accurately and efficiently.

Just thinking loud, maybe we could introduce also a system of layers in Draw that would enable to separate between parts that are fixed or can be globally modified (rotation/translating the whole layer) and parts that are dynamically changed on each frame. The former could be simply compiled, while the latter could keep being interpreted.

henrikmk commented 2 years ago

There are some things that might have been poorly interpreted in the ideas in the original post.

It should be noted that R2 DRAW is able to be far faster than Red, because it can do a one-time DRAW block parse, where draw values can be changed alone afterwards, due to being able to bind values to other contexts.

So, if you say:

params: make object! [pos: 10x10 circumference: 20]

draw my-image bind [circle circumference pos] params

Then use my-image somewhere in a View UI.

Then you can change params/pos: 25x25and then do a show, and the image updates without needing reparse.

I don't know how it works internally, but the performance increase speaks for itself.

This feature leads to some thoughts:

This is theoretical, but I'm always trying to see if it's possible to build high performance draw systems without having to resort to tricks, and I think building draw queues is a way to go.

dockimbel commented 2 years ago

It should be noted that R2 DRAW is able to be far faster than Red, because it can do a one-time DRAW block parse, where draw values can be changed alone afterwards, due to being able to bind values to other contexts.

It should be trivial to add word evaluation to Red/Draw interpreter in a local branch and then do a comparative benchmark to see what the gains would really be. My guess is that the gain would not be significative in the general case and only moderately significative in some specific use-cases.

draw my-image bind [circle circumference pos] params

Is draw there referring to the draw command in VID or the draw native funtion?

I don't know how it works internally, but the performance increase speaks for itself.

If you don't know how it works internally, how do you know how much of the gains are caused by the lower number of Draw (re-)parsing?

From our benchmarks on the SVG Tiger animation, which is pretty heavy on Draw commands, the Draw parsing part only slows down the rendering by 12%. We measured it on Android using two versions, one entirely written in Java and one using Red and Draw using Java2D API as backend (same as for the pure Java version). The Red version also suffers from the JNI + bridge overhead in order to access the Java/Android API, so the Draw parsing overhead is in fact even less than 12%.

henrikmk commented 2 years ago

@dockimbel you're spinning and zooming a static drawing, so you can get by with reparsing the same draw block over and over, but you're still going to cap performance to how fast R/S can process that draw block.

If you rebuild the entire draw block from scratch every time and then parse it, which is necessary in Red to do things like graphs, unless you want to sit and poke new values into hundreds of absolute locations, you're going to lose a lot more than 12% performance.

The draw is the native function, but it also works with the draw word in the effect block.

dockimbel commented 2 years ago

The draw is the native function, but it also works with the draw word in the effect block.

In both cases, I don't understand how your code works and how you can draw (pun not intended) such conclusions. Can you provide a fully working (minimal) script for Rebol2?

BTW, I remember already having that conversation on Gitter with you a few years ago and it was not conclusive. So please be clear, accurate and provide working code with measurements.

hiiamboris commented 2 years ago

Just a note: Draw block can include other blocks, so no need to rebuild the whole block every time if only a local part changes (that can be in a separate block).

Oldes commented 2 years ago

I don't have idea how it was done in Rebol2, but in Rebol3 (Atronix version) there was a delect native, which converted the draw dialect block to block of ordered draw commands, which were processed by the draw dispatcher.

I don't consider it to be optimal either.

Oldes commented 2 years ago

@hiiamboris When in Red there is a block inside a draw block, it just recursively parses itself. Or you mean that it could be part of the optimizations?

hiiamboris commented 2 years ago

It was my comment to Henrik's:

If you rebuild the entire draw block from scratch every time and then parse it, which is necessary in Red to do things like graphs

henrikmk commented 2 years ago

OK, I guess I need to retract the statement that it's faster. I cannot reproduce this case in a test program. That's on me.

What I may be seeing in my more complex tests is:

These things are available due to binding, but does not produce faster raw DRAW performance.

So, that means that there is inside R2 draw a constant PARSE overhead, which is the same as Red, which means that draw block parsing is a constant performance cap for both R2 and Red (if you only focus on, say, draw N circles per second).

This makes better sense than my original claim.

But, I would still advocate binding values to draw blocks for the reasons above to ease changing very large draw blocks.

I will then also advocate for draw queues as a method for bypassing parsing performance caps, so you can execute prepared draw calls quickly.

Oldes commented 2 years ago

But Henrik is right, that Red must parse it every time, when you want to redraw it. Now. I really think, that having the draw block converted to commands/arguments would be good optimization.

If I understand Henrik well, and what I would like to see, is, that one make a widget... like a graph with 1000 columns, and than is able to just use a vector with 1000 values to draw it... with optionally setting widget's size and colors.

hiiamboris commented 2 years ago

if line command accepted a block of pairs that would work...

Oldes commented 2 years ago

What if you want to draw columns? Or something more interesting with the same data input?

hiiamboris commented 2 years ago

Then you write a routine that fills the columns commands from that input :)

greggirwin commented 2 years ago

https://gitter.im/red/red?at=61f2ea32d41a5853f9704295 A couple small examples here. Given what @GalenIvanov is doing with animation, what I'd love to see is a small collection of samples we can use for discussion. @henrikmk has pushed list-view limits, and @Oldes knows deeply how animation is used, especially in games. If we have examples we will learn from them, and even get hard numbers on different hardware. It will also let us identify current limits of where Red can be applied, how many draw elements, etc. Henrik's comment on live charts also resonates with me, and having watched some really nice math and other videos done with Unity, I think data viz is a good use case that spans a number of areas. If we can do games like Angry Birds, that's another flag we can plant in the ground.

greggirwin commented 2 years ago

The reason I'll push for code examples is that we can talk around each other a lot, but working code is an important tool we know how to use for communication.

greggirwin commented 2 years ago

I still miss R2's effect pipeline at times. Those grid demos I just posted used effect [grid ..., and I had to cut it out for Red. The combination of that, and gradients, make really nice UI effects blindingly simple. e.g. in altme, or in how extend worked to stretch images for buttons and skins.

pekr commented 2 years ago

@greggirwin IIRC, Effect pipeline was planned for Red too, so isn't it just case of not being implemented ... yet? Or am I wrong?

@Oldes I thought that DELECT pipeline was used for R3 components / plugins, and later on was even scrapped as a concept? I only vaguely remember it being related to gfx pipeline, at least not to VID itself. Not sure about the Draw, but as R3 is open-sourced, we would be possible to find out.

henrikmk commented 2 years ago

If I understand Henrik well, and what I would like to see, is, that one make a widget... like a graph with 1000 columns, and than is able to just use a vector with 1000 values to draw it... with optionally setting widget's size and colors.

Sorry for this long blurb.

What I wanted to see with draw queues is that they serve as drawing subroutines, which can form complex draw primitives with adjustable parameters.

Examples of that:

You could for example build a draw queue for a simple, resizable text box (something that is actually really hard to do properly in R2):

  1. Set text as "foobar"
  2. Translate by 30x30.
  3. Set draw color
  4. Set fill color.
  5. Set line thickness.
  6. Set line pattern.
  7. Find the size of the text "foobar" and store it in text-size
  8. Paint a box using text size and fill color.
  9. Paint an edge around the box using text size, draw color, line thickness and line pattern
  10. Paint the text "foobar" from the text storage.

You could make it more complex with shadows, do cursor tracking, so you could tell where to draw the next item, etc.

Then you execute the queue at maximum speed. Then you keep it around for another use unless you delete the queue.

Making that generic, the parameters could be picked out, so the draw queue only is steps 7 to 10, and use it repetitively.

  1. Set text as "foobar"
  2. Translate by 30x30
  3. Execute text-box-draw
  4. Set text as "foobar2"
  5. Translate by 50x50
  6. Execute text-box-draw

Implemented, it could be:

text-box: make-draw-queue [
  text-size: size-text string
  box text-size ; position and fill color is already set
  frame text-size ; position and draw color is already set
  text string
]

text-boxes: make-draw-queue [
  font: my-font-object ; externally defined font object
  string: "foobar"
  translate 30x30
  execute text-box-draw
  string: "foobar2"
  translate 50x50
  execute text-box-draw
]

final: make-draw-queue [
  fill 0.0.0 ; clears image with black
  fill-color: 40.50.60
  draw-color: 0.0.0
  translate 10x10
  execute text-boxes ; executing sub-queue, which executes 2 other sub-queues
  translate 50x50
  execute text-boxes
]

execute-draw-queue image final

There could also be commands to set which bitmap to paint on, clip regions, copy/paste regions, doing effects like blur, convolve, grayscale, flip or 90-deg rotate on regions, etc. This is all destructive painting on bitmaps. If you want to build a non-destructive, procedural effects pipeline on top of all this, it could be done this way, as it should be very fast.

Loops are unwound, when building the queues, prior to use, but I don't see them being all that big, in the megabytes range or so. The greatest memory issue, I see with complex 3D geometry, which can be gigabyte sized.

Queues could be stopped and return to a previous queue if some parameter is exceeded, such as a text cursor exceeds the bottom right corner, so there is no reason to keep going.

Commands could be omitted, when building the draw queue, if it is known they won't do anything (setting draw color to fully transparent, for example)

You could switch bitmaps inside the draw queue and use multiple bitmaps for compositing without leaving the draw queue.

On a high level DRAW dialect, draw queues could serve as building blocks that are executed per word, or they are direct draw queue commands. You can build your non-destructive effects pipeline this way:

draw image [
  fill black ; direct commands
  pen white
  fill-pen red
  font my-font
  translate 100x100
  text-box "foobar" ; existing draw queue
  region 50x50 bottom-right ; REGION draw queue would figure out how big the image is
  blur 0.2 ; let's ruin the image with some effects
  contrast 50
  region all
  sharpen 0.6
  colorize 50.255.50 ; yuck
]

This would have a performance cap, but you would be free to move performance critical parts into queues. In general, parsed DRAW blocks would be small and simple, where draw queues would be arbitrarily complex.

I have a lot more on the topic, such as Houdini style node-based pipelines, but time to end here.

greggirwin commented 2 years ago

Thanks for the detailed post @henrikmk!