tangrams / tangram

WebGL map rendering engine for creative cartography
https://tangram.city
MIT License
2.22k stars 290 forks source link

sprites failing to draw intermittently #646

Closed meetar closed 6 years ago

meetar commented 6 years ago

[heavily edited as my understanding of the problem evolved]

Since 0.15.0, sprite drawing intermittently fails in a set of Differ tests, I believe because they are very simple scenes containing only a single sprite, which apparently allows view_complete to be triggered before the final collision pass finishes drawing.

According to git-bisect the trouble started with https://github.com/tangrams/tangram/commit/882d4827d48be68c4796be13d89609acaea6b077 – although the Differ can't run the tests at all between that commit and https://github.com/tangrams/tangram/commit/6480e1f41622553f542f71c27e856546054172ef, when this issue becomes visible.

TO REPRODUCE THE ISSUE, FOLLOW THESE STEPS:

https://tangrams.github.io/differ/?1=https://tangrams.github.io/differ-tests/unit-tests/textures/textures.json&2=&lib1=0.14.1&lib2=0.15.1&go

RESULT:

About 5/6 of the time (tested by hitting a single test's "refresh" button 300 times and counting 247 failures), where sprites in the left column are drawn, the sprites in the right column will fail. (Other tests with no visible sprites are for a situation fixed by another branch.)

I have a hacky fix for this issue here: https://github.com/tangrams/tangram/tree/debug-view_complete which runs a check on updateViewComplete() for a specific object and rebuilds rather than trigger view_complete.

More specifically: the hack looks for a non-empty object such as {labels: Array(1), containers: Array(1)}, returned by mainThreadLabelCollisionPass() and passed through updateLabels().

Another workaround is to comment out the check for !this.collision.task at https://github.com/tangrams/tangram/blob/7faa575b2dbe934d5287628878fe5460c1dac3e3/src/tile_manager.js#L159 and run the code inside that block whether there's an existing task or not.

meetar commented 6 years ago

I've narrowed down and recreated the issue using a single scene file, to reproduce, load http://tangrams.github.io/tangram-frame/?url=http://tangrams.github.io/differ-tests/unit-tests/textures/sprite-defined-auto.yaml&lib=0.15.1#1/0/0

Then in the console, execute: scene.load(scene.config_source)

The point appears about half the time, and in a debugging branch I determined that in those cases the collision task never starts: https://github.com/tangrams/tangram/compare/debug-view_complete

This line fixes the problem, but probably also breaks the queue: https://github.com/tangrams/tangram/compare/debug-view_complete#diff-82a25b8d3694c18472203bd17136f5a2R197

In the debug output you can see that the time_elapsed property is NaN when the points don't draw, that can be fixed by this line https://github.com/tangrams/tangram/compare/debug-view_complete#diff-925c363847cebdee039f87c3924f8de1R20 but that doesn't seem to be related to the larger issue.

bcamper commented 6 years ago

OK, I'm fairly confident I've solved this in this series of commits: https://github.com/tangrams/tangram/compare/6972e57c13bde0269ce0373c9d0aa1e1555ce2d9...c29a7cbe213f5b78b5639ec75e1ec1d519b6be9d. I believe the root cause was a race condition, due to the main thread label collision code not including "pending" meshes in the collision pass. Pending meshes are newly created meshes sent from a worker to the main thread, but are "pending" because they shouldn't be displayed until they are collided with other labels (to make sure there are no visible overlaps); so, I think this is a logic error that they weren't incorporated in the first place. in "normal" use, they get picked up by the next collision call (whenever a collision pass ends, we check to see if another one is needed for any meshes added while the last pass was running, and these formerly pending meshes would be found and collided in this second collision call). Not directly related, but these commits also simplify the logic for checking to see if a new collision pass is needed, by using a string representing all the meshes in all the currently visible tiles, based on their create timestamps.

Thanks for digging into this issue! While the task system was not the root cause, it was very helpful for debugging and provided some crucial clues. I believe the task was not firing in this case because the race condition meant the second collision pass, which would cover the missed "pending" meshes, was never fired before the scene thought it had finished rendering. From what we've seen this only happens in very simple scenes, like Differ test cases, which can complete rendering so quickly.

I will pull in the code comments you added on the debug branch, as they are useful for comprehension and debugging beyond this specific issue.

bcamper commented 6 years ago

Released in v0.16.0

meetar commented 6 years ago

Confirmed with the Differ!! Nice work @bcamper 🏆