Closed meetar closed 6 years ago
I've narrowed down and recreated the issue using a single scene file, to reproduce, load http://tangrams.github.io/tangram-frame/?url=http://tangrams.github.io/differ-tests/unit-tests/textures/sprite-defined-auto.yaml&lib=0.15.1#1/0/0
Then in the console, execute: scene.load(scene.config_source)
The point appears about half the time, and in a debugging branch I determined that in those cases the collision task never starts: https://github.com/tangrams/tangram/compare/debug-view_complete
This line fixes the problem, but probably also breaks the queue: https://github.com/tangrams/tangram/compare/debug-view_complete#diff-82a25b8d3694c18472203bd17136f5a2R197
In the debug output you can see that the time_elapsed
property is NaN when the points don't draw, that can be fixed by this line https://github.com/tangrams/tangram/compare/debug-view_complete#diff-925c363847cebdee039f87c3924f8de1R20 but that doesn't seem to be related to the larger issue.
OK, I'm fairly confident I've solved this in this series of commits: https://github.com/tangrams/tangram/compare/6972e57c13bde0269ce0373c9d0aa1e1555ce2d9...c29a7cbe213f5b78b5639ec75e1ec1d519b6be9d. I believe the root cause was a race condition, due to the main thread label collision code not including "pending" meshes in the collision pass. Pending meshes are newly created meshes sent from a worker to the main thread, but are "pending" because they shouldn't be displayed until they are collided with other labels (to make sure there are no visible overlaps); so, I think this is a logic error that they weren't incorporated in the first place. in "normal" use, they get picked up by the next collision call (whenever a collision pass ends, we check to see if another one is needed for any meshes added while the last pass was running, and these formerly pending meshes would be found and collided in this second collision call). Not directly related, but these commits also simplify the logic for checking to see if a new collision pass is needed, by using a string representing all the meshes in all the currently visible tiles, based on their create timestamps.
Thanks for digging into this issue! While the task system was not the root cause, it was very helpful for debugging and provided some crucial clues. I believe the task was not firing in this case because the race condition meant the second collision pass, which would cover the missed "pending" meshes, was never fired before the scene thought it had finished rendering. From what we've seen this only happens in very simple scenes, like Differ test cases, which can complete rendering so quickly.
I will pull in the code comments you added on the debug branch, as they are useful for comprehension and debugging beyond this specific issue.
Released in v0.16.0
Confirmed with the Differ!! Nice work @bcamper 🏆
[heavily edited as my understanding of the problem evolved]
Since 0.15.0, sprite drawing intermittently fails in a set of Differ tests, I believe because they are very simple scenes containing only a single sprite, which apparently allows
view_complete
to be triggered before the final collision pass finishes drawing.According to git-bisect the trouble started with https://github.com/tangrams/tangram/commit/882d4827d48be68c4796be13d89609acaea6b077 – although the Differ can't run the tests at all between that commit and https://github.com/tangrams/tangram/commit/6480e1f41622553f542f71c27e856546054172ef, when this issue becomes visible.
TO REPRODUCE THE ISSUE, FOLLOW THESE STEPS:
https://tangrams.github.io/differ/?1=https://tangrams.github.io/differ-tests/unit-tests/textures/textures.json&2=&lib1=0.14.1&lib2=0.15.1&go
RESULT:
About 5/6 of the time (tested by hitting a single test's "refresh" button 300 times and counting 247 failures), where sprites in the left column are drawn, the sprites in the right column will fail. (Other tests with no visible sprites are for a situation fixed by another branch.)
I have a hacky fix for this issue here: https://github.com/tangrams/tangram/tree/debug-view_complete which runs a check on updateViewComplete() for a specific object and rebuilds rather than trigger
view_complete
.More specifically: the hack looks for a non-empty object such as
{labels: Array(1), containers: Array(1)}
, returned by mainThreadLabelCollisionPass() and passed through updateLabels().Another workaround is to comment out the check for
!this.collision.task
at https://github.com/tangrams/tangram/blob/7faa575b2dbe934d5287628878fe5460c1dac3e3/src/tile_manager.js#L159 and run the code inside that block whether there's an existing task or not.