parasyte commented 12 years ago

This will be really nice to have! Also I'd like to do a quick audit to ensure the dirty rects are being grouped properly. Specifically, any overlapping rectangles should be merged into a single rectangle. So that worst case will have about the same performance characteristics as dirtyRegion = false.

I'll also experiment with layer clipping (culling pieces that are hidden behind higher-priority layers) ... I'm unsure if it will be faster to detect where to clip (it is faster in theory), or faster to just draw everything in order (as implemented now). Clipping detection should be done on a per-tile basis; tile id "0" should never clip anything below it. Also tiles with transparent pixels should never clip. It will be interesting to see how this will all work out with parallax layers.

A few additional thoughts, while in mind:

Debugging tools for this will be important. There's a me.debug.renderDirty flag which will be helpful. Also consider rendering tile grid lines and highlighting tiles that intersect during clipping detections.
Ensure objects and backgrounds are only drawn on the first frame, unless they want an update; e.g. object animation or level scrolling.

parasyte commented 12 years ago

Even with the current patch, using me.sys.preRender is still way faster than me.sys.dirtyRegion. Kind of disappointing.

I believe it has to do with the number of rects + number of objects that are marked "dirty" though -- it's way too high, and drawManager.draw() has a time complexity of O(n) where n = dirtyRects.length + dirtyObjects.length, and dirtyObjects contains every object within the viewport.

A significant improvement will come from grouping rects, and then associating objects with each rect that intersects. no intersection means no object in dirtyObjects. This will reduce n quite a bit on average. Worst case will be same as current.

melonjs commented 12 years ago

Currently the engine just merge the rect corresponding to the previous position, with the rect corresponding to the new position. I've been thinking as well about merging other rect, but though it would bring performances down (though I never gave it a try).

Else, for scrolling level there is I believe two way to implement it :

the lazy one : when the engine detect a camera change (e.g. when the viewport update function return true ? or through the viewport calling makeAlldirty() when changing), the engine ignore all rect and just use the viewport rect
the good one : when the camera is updated, the current canvas content is translated correspondingly to the camera update, and a dirty rect is defined for the missing part.

parasyte commented 12 years ago

Second option ("good one") won't work for parallax scrolling layers. :( That's why I'm thinking about layer clipping (culling). I imagine the general case will have about 25-50% of a parallax layer completely obscured by some layer that draws over top of it. In this case, it's not worth drawing the hidden parts at all. And with multiple parallax layers, the time savings from culling should really add up nicely.

melonjs commented 12 years ago

Following my previous post on the topic, I was actually wondering if clipping could not just be simply achieved using the context global composing option. This would need to reverse the draw order, but with an option like "destination-over", it shoud work? I would think that the browser then actually do some clipping? This deservers some benchmarks :)

Else did u see the clip() function as well? I never tried it myself.

parasyte commented 12 years ago

I experimented with the compositing options in my game, actually, to do the drop-shadow text in the coin counter. I found that browsers don't always agree about how certain operations should be performed. So a lot of my tests would be broken in Chrome, but working in Firefox.

However, it's still worth exploring because there is a standard that specifies exactly how each operation works, and what the results will look like.

Using the composite operation for many tasks will require creating mask shapes, so you'll need at least one hidden canvas context to work with for creating masked shapes.

clip() seems only useable with paths (vector shapes) so I don't know how well it would work on pixels. It might form visible gaps between layers as the colors at the edges get antialiased and mixed separately.

melonjs commented 12 years ago

yep, except from source-over, destination-over, and a couple of other ones, the rest of them are not really supported, specially for the shadow stuff :)

else, what do you have in mind to implement clipping ? because to be honest, I also had this as a potential improvement, but I never managed to come out with a nice idea to implement it.

The only idea I had was to do some post-processing once the map is loaded to create additional layer with clipping information, but I always found this idea quite "heavy".

Other issue I found with this was when a layer was using opacity, and in theory this is even dynamically modifiable with the new renderer.

parasyte commented 12 years ago

The only info needed is true or false; whether lower layers can be seen behind the tile or not. When opacity < 1, lower layers will always be seen behind the tile (so disable clipping).

I envision it will require n-1 arrays, where n is the number of layers. And each lower layer "inherits" the 'true' elements in all higher layers. In other words, each clipping map acts as a large binary number, and te binary numbers are OR'd together as the z-level lowers.

parasyte commented 12 years ago

Trying to come up with a sane method of making this "clipping" work, I discovered that the current canvas spec on WHATWG supports "Path primitives", e.g. it's possible to create path objects that can be saved into a variable and reused later! According to this post, it was added around May of this year: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2012-March/035239.html so user agent support is probably very sparse at the moment.

This could be the perfect use case: a Path object is created for each layer. Rectangles are added to this Path for every tile that can be "seen through". This path is then given to the layer below as its clipPath. When it's time to draw the layer, context.clip(layer.clipPath) is called to set the clipping region to the layer's clipping path. The only parts of the layer that will be drawn are the parts with "transparent pixels" in the layer above.

I think it can also be "emulated" by generating a dynamic function that builds the path using the normal context.rect method. That function is then used in place of the Path primitive. This idea will need some more work.

Also of note is the "Hit Regions" support, which may be useful as shortcuts for mouse and touch events.

parasyte commented 12 years ago

Clipping is still secondary to fixing the rectangle merging. Here are some notes I wrote while I was initially working on the patch branch. I'll try to clarify the shorthand below.

An algorithm to union a large number of rectangles

 _____   _____   _____
|     | |     | |     |
|  1 _|_|_ 2  | |  3  |
|___|X| |_|___| |_____|
    |  4  |
    |_____|

 _____________   _____
|       |X|   | |     |
|       |X|2  | |  3  |
|    1  |X|___| |_____|
|         |
|_________|

 _____________   _____
|             | |     |
|             | |  3  |
|      1      | |_____|
|             |
|_____________|

I believe this is the most efficient way to merge an assortment of rectangles with unique sizes and positions. The three graphs represent three different stages of merging four small rectangles (numbered 1 - 4). The X indicates where an overlap is detected in two rectangles. This is because only two rectangles are compared at a time. The comparison steps taken are:

Rects 1 and 2 are compared. There is no intersection
Rects 1 and 3 are compared. There is no intersection.
Rects 1 and 4 are compared. There is an intersection: Rect 1 is updated with the union of rectangles 1 and 4. Rect 4 is removed. The process is repeated again starting with the new unioned rectangle (Rect 1).
Rects 1 and 2 are compared. There is an intersection: Rect 1 is updated with the union of rectangles 1 and 2. Rect 2 is removed. The process is repeated again starting with the new unioned rectangle (Rect 1).
Rects 1 and 3 are compared. There is no intersection.

You might say the worst case time complexity on this is O(n**2). Practically, it will be a lot lower, because each intersection removes one rectangle from the remaining set. Also, any rectangles which have been processed once should never be checked again. In this example, Rect 1 is fully processed at the end of step 5: If there were other rectangles (assume they do not overlap, for simplicity) the next iteration would start with a comparison between Rect 3 and Rect 5, then Rect 3 and Rect 6. The final iteration would compare just Rect 5 to Rect 6.

So in reality the absolute worst case is in fact O((n**2-n)/2 )! Our example with 4 rectangles gives a worst case of 6 operations, and with 6 rectangles: 15 operations. But because of the reductions early on, we actually only perform 5 and 8 operations, respectively.

After this rectangle merging stage, we are left with only two rectangles that need to be updated, and no overlaps (this is exactly what we want).

The next phase is adjusting how dirty rectangles are associated with dirty objects. The current method can be outlined as follows:

This is bad

dirtyRects = [
    {32, 32, 64, 64},
    {140, 233, 16, 16},
    {24, 455, 24, 16}
]

dirtyObjects = [
    background,
    level,
    player,
    coin_1,
    coin_2,
    coin_3,
    coin_4,
    coin_5,
    coin_6,
    coin_7,
    foreground,
    debug
]

O(n*m) .. n = dirtyRects.length, m = dirtyObjects.length = 3 * 12 = 36

The dirtyRects shorthand is a list of {x, y, w, h} numbers. The dirtyObjects only list an object's name as shorthand. In this case, "background" is a background image, and "level" and "foreground" are TMX tile layers, but all three can be considered the same thing: very large static objects. There's also a player character, 7 animated coins, and a debug object, which can be considered a HUD.

The game engine will iterate through all dirtyObjects for each dirtyRect. Which is how we end up with 36 operations down at the bottom with our O(n*m) notation. In practice, there is only one dirtyRect: the viewport rectangle (full screen). But we don't want to draw large portions of the static background/foreground/HUD/everything on every frame. So we use smaller rectangles, and only update the portions of the screen that change (objects move position/animate, etc.) Then we merge all of the overlapping rectangles (see above) until we are left with a minimal set of rectangles to iterate. Now we are left in the state shown above in the dirtyRects and dirtyObjects arrays; Three rectangles (a minimal/non-overlapping set) and 12 objects.

The real killer here is that n*m. We can fix that by associating our rectangles and objects directly with one another.

If we combine the two arrays into a sort of hash table, we might end up with a structure like this:

This is good

dirtyObjects = {
    background : [
        {32, 32, 64, 64},
        {140, 233, 16, 16},
        {24, 455, 24, 16}
    ],
    level : [
        {32, 32, 64, 64},
        {140, 233, 16, 16},
        {24, 455, 24, 16}
    ],
    player : [
        {24, 455, 24, 16}
    ],
    coin_3 : [
        {140, 233, 16, 16}
    ],
    foreground : [
        {32, 32, 64, 64},
        {140, 233, 16, 16},
        {24, 455, 24, 16}
    ],
    debug : [
        {32, 32, 64, 64}
    ]
}

O(n) .. n = sum(each dirtyObject) = 12

O(n) is MUCH better!

Actually, I cheated just a bit. This dirtyObjects would also have coin_1, coin_2, ... coin_7. So add 6 more operations for a total of 18. It's still a lot better than 36!

This structure gives us each object, plus the rectangles that they fall within. The three "level" layers will intersect all dirty rectangles in the typical case. Which leads right in to the next phase:

Clipping dirty rectangles

The lowest (first drawn) of the three level layers "background" can very likely be covered by parts of the upper layers, especially where the platforms and interesting parts of the level are drawn. Likewise, the highest of the layers can very likely have large empty spaces where other parts of the background (and objects) can be drawn. For each of these cases, we do not want to waste time drawing: a) A piece of the level which will be covered up by a different piece of the level. b) A completely empty (transparent) piece of the level.

The rectangles at this phase will be fairly big, so we can't effectively do anything like breaking them into smaller rectangles (each the size of a map tile) and throw away the ones we don't want. That would probably negate any potential savings. Instead, we can use the canvas clipping magic to specify areas of each layer that we KNOW will not be covered. We will KNOW about these areas because they will be pre-computed when the tiles are added to the layer(?). Or after the stage has fully loaded, and recomputed when the tile maps are changed at runtime.

I have not worked out the best way of defining these clipping regions, and I don't know for certain if they will improve performance. The best way to test that is just with plain ol' benchmarks. If the benchmarks look promising, then it will be worth exploring the idea (clipping) more thoroughly.

Current state of affairs

This sums up the current line of thinking for the dirtyRegion updates:

Merge dirty rectangles
Associate dirty rectangles with the dirty objects they intersect.
Maybe perform tile-based clipping on all TMX layers (except the upper-most layer). Pending investigation and benchmarks.
This is all just theoretical, and may not help with performance at all.

parasyte commented 12 years ago

I just ran a simple benchmark on Chrome that used context.drawImage on a 640x640 image in a tight loop with 500,000 iterations. I used three loops:

A "warm up" loop to stabilize any JIT fluctuations.
A test loop without a clipping region
A test loop with a clipping region: a single 64x64 rectangle

For each loop, I recorded the time (in ms) that each takes to complete. the first two loops were always fairly constant, around 640ms. The third loop was also fairly constant, but took around 630ms. So the performance gain from using just a clipping region, even a small one, is negligible in Chrome.

It then crashed my browser, which was spending a great deal of time trying to update the window server with all of those draws.

This is sad news, but kind of expected; even when only 1/100th of the image is visible through the clipping region, it doesn't have any measurable performance impact. Canvas context clipping is out!

The other way to do clipping will only be used when the layer/global preRender flag is false: Each tile in the layer needs to know whether it will be covered by tiles in any upper layer, and if so, skip the tile drawing. This will be easiest to implement at first with non-parallax layers, where the tile grids are always aligned. For parallax scrolling layers, some rectangle maths might need to be employed.

parasyte commented 12 years ago

Merged master into the ticket-101 branch. I'm going to scale the requirements for this ticket down quite a bit to get it ready for release by the end of November. I'll tackle the scrolling issue here, and then break off the tough algorithmic changes into separate tickets.

melonjs commented 11 years ago

@parasyte

same for this one : 0.9.6 ?

parasyte commented 11 years ago

Totally broken now by #176. :)

melonjs commented 11 years ago

lol :)

melonjs commented 11 years ago

@parasyte though it's fixed level example, I tried the dirtyRect example provided with melonJS, and it's still working :)

however there is bug linked to the image Layer, but since I have not tested this one since all the changes made in the current release, i"m not sure it's actually linked to the today changes.

parasyte commented 11 years ago

@obiot I'll move this one to Future. It's going to be more work than I can put into it for 0.9.7. Will focus on #103 instead (and take out #16, #50, and #169 in the process!)

melonjs commented 11 years ago

@parasyte that's perfectly fine for me ! :) I will however see on my side to at least fix it for non scrolling level, so that we do not add any regression in the process.

melonjs commented 11 years ago

@parasyte my god, today I had in mind to get rid of all this code where we add the viewport position to the give rect, as it would be a very nice optimization (since we do this everytime, everywhere), like here for example : https://github.com/obiot/melonJS/blob/master/src/level/TMXRenderer.js#L77

but changing it is like opening a pandora box, and between the different things that can change the current viewport, and the broken direct rect implementation, nothing is working... I basically just spent 3 hours on this thing, with no concrete results, I'm frustrated :) :) :)

parasyte commented 11 years ago

@obiot Ooooh yes. It is quite a fragile nightmare. :(

obiot commented 11 years ago

@parasyte shall we actually kill the corresponding branch ? but keep the ticket of course

parasyte commented 11 years ago

Deleted!

obiot commented 11 years ago

awesome, I love to clean stuff ;)

melonjs / melonJS

Performance: scrolling support with dirtyRegion #101

An algorithm to union a large number of rectangles

This is bad

This is good

Clipping dirty rectangles

Current state of affairs