octotep / bevy_crossterm

Crossterm plugin for the bevy game engine
MIT License
99 stars 17 forks source link

Improve calulation_entities_to_redraw system #1

Open octotep opened 3 years ago

octotep commented 3 years ago

The asset change detection is rough. We have to loop through all asset added/changed events, then loop through ALL entities to see which ones have assets that need to be changed. Not very graceful. There are a few improvements to be made here:

wbrickner commented 3 years ago

Whenever a sprite moves you know in O(1) what rect it occupied before and after, and can redraw that region only. But what sprites are in those regions that need to be redrawn now? How to avoid an O(n) search for intersection?

I believe you can accomplish this by maintaining a "pointerless octree" (quadtree for this project as it is 2D), which leverages Morton keys for each subquadrant.

I think there is a way to very quickly convert a region to the morton keys that make it up, which then immediately provides the sprites to re-render to composite back into the frame, and if you're clever even which subrect inside a sprite you need, but that may be worse for performance on most sprites.

Here is a link to the paper that gave me information about this technique, albeit for a totally different application: Fast Generation of Pointerless Octree Duals.pdf. Most of the specifics in that paper can be ignored, but some of the ideas there could allow bevy_crossterm to re-render insane numbers of sprites per second even on low end hardware because the operation would be (almost) free.

wbrickner commented 3 years ago

Something adjacent to the technique I proposed can be found in the isosurface crate, used to optimize the marching cubes algorithm:

https://docs.rs/isosurface/0.0.4/isosurface/linear_hashed_marching_cubes/index.html

If I have the free time I may implement a proof of concept for bevy_crossterm / a reusable crate which can be leveraged for this task if my intuition turns out to be correct and it is possible to get large performance gains.

octotep commented 3 years ago

Wow this is good info thanks! I would certainly appreciate a crate which implements pointerless quadtree duals. After reading through the paper, I'm not sure I could pull it off as this domain (and computer graphics in general) is not one I'm very familiar with.

bevy_crossterm could definitely use a few performance boosts. As a library for terminal games, the less resources we can take up the better, as people generally expect terminal applications to be very lightweight. Examples typically run at 10+%CPU which is higher than i would want. Every bit of performance we can save, the better!

wbrickner commented 3 years ago

Awesome! Haven't had time, but probably soon! Do we have any profiling to determine which subsystems are most expensive? If every frame is redrawn from scratch I imagine that's your biggest problem, but I may be wrong.

I've played with high performance string manipulation before and found some methods are sneakily very slow (compared to a manual implementation) like .split(delim) for instance. I'll take a look and see if there's room for easy improvement elsewhere 😄

wbrickner commented 3 years ago

I'm realizing now that you and I may have been talking about different things! I am not actually involved or familiar with bevy_crossterm, I think you were talking about asset substitution or something, while I was talking about partial frame rendering etc. I'll check the source and see how bevy_crossterm works and if it makes any sense what I was saying 😅

octotep commented 3 years ago

You’re actually right on track! This system needs some improvement with collision detection for sure. Currently I use the broccoli crate with implements a hybrid structure of a k-d tree and mark and sweep algorithm to find collisions. Since the tree created by broccoli cannot have entities inserted after construction, I recreate the tree every frame, which seems expensive to me.

I didn’t lead with improvements to collision detection because it’s a domain I’m very unfamiliar with, but I welcome any input/contributions in this area!

WRT to perf. I think there’s a lot of perf on the table that I haven’t been able to find yet. I’m not sure what your platform of choice is to develop on rust is but if you know how to work perf on Linux, I’d appreciate some pointers. When I followed some guides to make a flame graph, all the symbols only had the crate name that they were from, not the function name, which made understanding what I was looking at next to impossible