Closed mreinstein closed 7 years ago
So, first of all, I have no objections against performance-related improvements (of the shadowcasting in particular, as this code is indeed pretty resource-intensive).
But ideally such work shall be done without modifying the public API. This is not always possible, though.
I see two alternative approaches for optimizing _getCircle
memory (de)allocation complexity:
caching results; I would guess that in your case, the FOV is computed many times per second but for a fixed center point. _getCircle
is (almost) a pure function, so its results can be memoized for quite some time -- at least until the [cx, cy]
values change.
caching results without a particular [cx, cy]
. The _getCircle
can be rewritten so it (always) returns results for [0, 0]
; these can be cached indefinitely, making all further _getCircle
calls just a trivial cache lookup. Resulting coordinates will have a diff-like semantics, so all callsites of _getCircle
will have to manually add [cx, cy]
to neighbor values -- I suppose this will not bring any measurable performance regressions. The downside here is that all algorithms using _getCircle
would need to get adjusted accordingly.
What do you think?
@ondras
But ideally such work shall be done without modifying the public API.
Agreed. And having looked at the public API, I don't think it needs to change. Given that you're using the visitor pattern to provide a callback which fires for every visible cell in the result set, this is already fairly suitable.We could also introduce another argument at the end (some result array) and if it's present, fill that with results instead of firing a function for every cell in the result set. PreciseShadowcasting.compute(...)
could return how many items were filled in the result array. This would cut down on function call overhead and not break existing api.
caching results; (_getCircle)
yeah I've considered memoization on that particular function. As you mentioned it's a function with few side effects and a ripe candidate. Having done some prelim cpu profiling, the bulk of time is actually spent in _checkVisibility
though. I just mentioned this particular function because it's fairly simple compared to some of the others.
one thing that really surprised me is that I wrote a far less algorithmically optimal implementation, and it seems to dramatically outperform this precise shadow casting running in half the time. I think these could be some contributing reasons:
_checkVisibility
makes heavy use of recursionSHADOWS
is an array of arrays so there's a lot of array creation going on)splice
is notorious for leaking memoryI'm wondering if the above items are preventing V8 from doing more intense optimizations. Anyway, I'm not making a comparison to brag or anything, just hoping to brain storm some ideas on why the perf characteristics might be the way they are and think if some things that might improve the situation.
I'd be happy to share my implementation if that would be enlightening.
The Precise Shadowcasting algo is rather old and I do not recall every implementation detail. But:
_checkVisibility makes heavy use of recursion
IIRC, the recursion happens only during an edge case of a segment that wraps around the zero/360 angle.
heavy use of array construction (e.g., SHADOWS is an array of arrays so there's a lot of array creation going on)
Right, this main data structure is a set of angles. An array of two-item tuples seemed like a natural fit.
splice is notorious for leaking memory
Could be a reason, yes. This code was written without any particular performance focus. Do you have any comparison with Mozilla's JS?
no typed array usage
True. Note that rot.js
itself supports browsers down to IE9, so this is why there are no advanced JS features, such as typed arrays.
The problem with rot.js
here is that it is built primarily for turn-based games, so the focus was not on performance, but rather on simplicity, ease of usage (hence the Visitor pattern, which is prevalent basically in every part of the toolkit), straightforward algorithms etc. Making these things fast would, in many cases, equal in a complete rewrite. Something I really do not have time nor motivation for right now.
I would be, however, interested in your more performant implementation of the Precise Shadowcasting, as I believe that a reasonably fast real-time FOV is a valuable asset. (Note that fast FOVs typically use scene geometry to do their stuff; this cell-based evaluation is highly specific to Roguelike genre and is probably a source of a great performance degradation.)
@ondras btw I just want to back up and say I hope that these points haven't been taken as harsh criticism. The reason I reached out to you here is you've got one of the most elegant and well documented js shadowcasting routines I've found on the web to date, and figured it'd be better to try to improve the best implementation I found rather than go off and add my own. Great work on this code, and rot.js
overall! 👍
An array of two-item tuples seemed like a natural fit.
It's definitely a natural fit for modeling the problem elegantly, but not as good a fit for actually making it run efficiently in the VM. I guess that's a tradeoff that could be made. not sure if it should be made. : )
Do you have any comparison with Mozilla's JS?
This is a great resource related to javascript array performance https://gamealchemist.wordpress.com/2013/05/01/lets-get-those-javascript-arrays-to-work-fast/
Note that
rot.js
itself supports browsers down to IE9, so this is why there are no advanced JS features, such as typed arrays.
Makes a lot of sense. In my case I'm building a game on newest stable chrome release so I can be much more aggressive about using advanced features. Definitely not advocating doing this, just a brain dump of all the perf reasons I could think of.
The problem with
rot.js
here is that it is built primarily for turn-based games, so the focus was not on performance, but rather on simplicity, ease of usage (hence the Visitor pattern, which is prevalent basically in every part of the toolkit), straightforward algorithms etc. Making these things fast would, in many cases, equal in a complete rewrite. Something I really do not have time nor motivation for right now.
I think this all makes a lot of sense. I'm aware that my use case is definitely different from the typical use cases for rot.js
. My hope was that I could make changes to your implementation rather than publish yet another similar thing on web. :) But I'm starting to think my use case is different enough to warrant it. Maybe I'll spin up a module that is forked off of your codebase, explain the divergence, and point people at rot.js
/ this repository for turn based use cases.
If you are interested in revisiting performance in this code, I could send you a PR that does 2 things:
_getCircle
operate on a unit circle. That would make the function fully pure and enable memoization/cacheing more readily.anyway thanks again for chatting with me on this. I enjoyed the discussion. Hope it didn't waste too much of your time!
btw I just want to back up and say I hope that these points haven't been taken as harsh criticism.
No offense done, not at all! I like to discuss stuff and I am fully aware that the FOV is indeed a resource-intensive routine which would benefit from perf improvements.
This is a great resource related to javascript array performance https://gamealchemist.wordpress.com/2013/05/01/lets-get-those-javascript-arrays-to-work-fast/
Good read! It should be noted, though, that JS VM's are changing fast, so there might be some new/changed insights into this, as the article is now more than three years old.
convert SHADOWS to be a flat array of values. Would make the implementation less elegant but probably faster
Why not. I am curious if flattening the structure brings some real perf improvements.
make _getCircle operate on a unit circle. That would make the function fully pure and enable memoization/cacheing more readily.
I hope I understand the terminology correctly -- do we still consider the r
argument, only removing x
and y
, replacing them with zeros?
anyway thanks again for chatting with me on this. I enjoyed the discussion. Hope it didn't waste too much of your time!
No problem. I like perf improvements :) Also, if your are into JS stuff with respect to performance, you might be interested in this unrelated project of mine, that builds on high-speed stuff: https://ondras.github.io/primitive.js/
@ondras my use case is a little atypical in that I'm doing a real-time (not turn based) roguelike. The rot.js shadow casting fov implementation is great in that it's very short and fairly readable, but it churns a lot of objects through the garbage collector.
I'm wondering if you'd be open to accepting PRs that would refactor this code to be gc friendly.
for example,
_getCircle = function(cx, cy, r)
returns a new array of x,y arrays. I would send a PR that changes it's API to be more like:This would be a lot more appropriate for a tight run time loop because
_getCircle
wouldn't allocate objects or arrays. I also suspect this would make the code run a little faster because chrome and other VMs can heavily optimize routines that do simple array based arithmitic.I realize this would have non-trivial API implications so I understand if you don't want to do this.