secondlife / jira-archive

2 stars 0 forks source link

[BUG-233107] Objects failing to render is happening more frequently of late #10255

Open sl-service-account opened 1 year ago

sl-service-account commented 1 year ago

Steps To Reproduce

Attachments

Links

Related

Original Jira Fields | Field | Value | | ------------- | ------------- | | Issue | BUG-233107 | | Summary | Objects failing to render is happening more frequently of late | | Type | Bug | | Priority | Unset | | Status | Accepted | | Resolution | Triaged | | Reporter | Whirly Fizzle (whirly.fizzle) | | Created at | 2022-12-20T23:55:06Z | | Updated at | 2023-03-08T21:57:30Z | ``` { 'Build Id': 'unset', 'Business Unit': ['Platform'], 'Date of First Response': '2022-12-22T13:58:06.374-0600', 'ReOpened Count': 0.0, 'Severity': 'Unset', 'System': 'SL Simulator', 'Target Viewer Version': 'viewer-development', 'What just happened?': 'Filling in...', 'What were you doing when it happened?': '....', 'What were you expecting to happen instead?': '....', 'Where': 'Any Region', } ```
sl-service-account commented 1 year ago

Maestro Linden commented at 2022-12-22T19:58:06Z

When you notice objects failing to render, sometimes doing one of the following will force the object to render, but fairly often nothing works and you need to relog to fix it.

  1. Right clicking where the missing object should be,
  2. Zooming camera out and back
  3. Toggling wireframe - see BUG-232625
  4. Rule out if it's a case of BUG-9439 - technically not a bug.
sl-service-account commented 1 year ago

animats commented at 2022-12-27T05:45:59Z

I've been able to reproduce this. It's intermittent, but the same objects disappear. See "hyperionfirestormbad06.png", attached. There are supposed to be solid panels in that building wall, which is the big GT building in Hyperion. I've had this happen several times in Firestorm for this specific object.

Notes:

sl-service-account commented 1 year ago

animats commented at 2022-12-27T07:21:19Z

Some large linksets to test with at Mainland 3 would be appreciated.

I've been looking at the logging in my own system, and I see tens of "orphans", objects where the child first appeared before the parent or the parent is missing. I need to look at that more closely. It's legitimate for that to happen; we're not guaranteed in-order delivery.  Orphans have to be parented when the parent shows up. You don't know their location in world until the parent shows up, because child prims have coordinates relative to the parent. Has anything changed that would make out of order object updates more likely? Such as introducing more parallelism server side?

This may be related to this issue, or not. More later.

sl-service-account commented 1 year ago

animats commented at 2022-12-30T19:24:49Z, updated at 2022-12-30T19:44:58Z

More info. I've been testing this further with my own experimental viewer. I've been spending too much time on this, trying to make sure it's not an error on my side. My experimental viewer is 100% safe Rust, has no common code with the LL-based viewers, has a rather different architecture, and has completely different bugs. So if we see the same problem in both systems, it's probably not viewer side.

sl-service-account commented 1 year ago

animats commented at 2022-12-30T20:29:12Z

More info: See bad version and good version, below, with the same prim count. Object updates are not being lost, but some of them may be bad.

![Screenshot from 2022-12-30 12-19-13.png](Screenshot from 2022-12-30 12-19-13.png)

Bad version above. Good version below. Note 244 prims in the linkset for both.

 

![Screenshot from 2022-12-30 12-20-36.png](Screenshot from 2022-12-30 12-20-36.png)

sl-service-account commented 1 year ago

animats commented at 2023-01-02T21:10:13Z

More notes:

Far from the viewpoint, like 250m away, but within a large draw distance, the interest list seem to omit small objects. (1m diameter omitted, 50m included.) That's reasonable. It also tends to omit objects that have small scripted rotational movement, such as clocks. Maybe even if in a child prim. That's wrong.

This may be a bug in my viewer. More later. But I'd suggest looking at code that decides some objects don't need to be in the interest list.

Mainland 3 is still offline. When it comes back up, I'd suggest having some big linksets and some simple objects with slow LLSetRot rotation. Preferably with copies in the corner far from the origin.

sl-service-account commented 1 year ago

animats commented at 2023-01-10T19:23:25Z

![Screenshot from 2023-01-10 11-21-15.png](Screenshot from 2023-01-10 11-21-15.png)

sl-service-account commented 1 year ago

animats commented at 2023-01-10T19:24:46Z

Mainland 3 (Aditi), the test area for this bug, is still offline.

sl-service-account commented 1 year ago

Erik Mondrian commented at 2023-01-11T13:23:33Z

I've been encountering this same issue with some objects failing to render and requiring a relog, most recently when I was visiting the New Kadath Lighthouse Art Gallery yesterday (New Kadath/34/40/23). I noticed that some of the walls were bare, missing maps or collections thereof that I knew had previously been there. The "Early Grid" object was one of those that did not show up, even when I was standing right next to it (and even after right-clicking and moving camera out and back). It's a linkset with 27 objects. Elsewhere in the gallery, another one that did not show up until after a relog was "The Virtual University of Edinburgh" exhibit, a linkset with only 17 objects. This is, unfortunately, not just happening with linksets that are 200+.

I've included the Inspect Objects info in the snapshots, in case that helps. And I'm on Arch Linux, running Firestorm 6.6.3 (67470) Aug 27 2022 16:07:25 (64bit / SSE2) (Firestorm-Releasex64).

Snapshot_187.pngSnapshot_188.png

sl-service-account commented 1 year ago

rick.daylight commented at 2023-01-13T20:46:22Z

I've seen this issue a lot too; far more missing objects that I'm used to seeing happen, and very often the objects will not reappear with any of the known 'tricks' like zooming out and back in, or even TP'ing out of the region and back.

This is the latest: I was testing a build with an alt, and the alt could not see the doors in the wall. My main account, also logged in, could see them.

The alt could interact with objects through the (closed) doors as if the doors did not exist, and could not click the doors to open or inspect them. However the alt could not walk through the doors until they were opened. Nothing, not even TPing out and back, made the doors appear for the alt except relogging.

Alt's view of the (missing) doors:

Snapshot_1706a.png

What was really there, view from main account:

Snapshot_1705a.png

sl-service-account commented 1 year ago

animats commented at 2023-01-13T23:38:21Z, updated at 2023-01-13T23:39:47Z

Here's a fail which is a bit more informative. This is one of my NPCs at Bruissac. The NPC seemed to have disappeared, but the hovertext on the root prim was still showing. So I could find it and open Edit. This NPC is an animesh, with an invisible but solid root prim that's a simple cube as the collision model. The animesh mesh and other links (clothing) are gone, but the root prim remains. The root prim is still selectable. Edit says it's 22LI, which is the full total with all links, but Edit only shows the root cube. As far as Edit is concerned, the other links are just not there. Trying to advance through the links shows this.

Bruissac seems to get hit by this a lot. I've seen half the buildings missing.

Open image in a new browser tab to see it clearly.

 

sl-service-account commented 1 year ago

animats commented at 2023-01-25T03:51:26Z, updated at 2023-01-25T05:46:33Z

Some progress:

I've found a workaround for my own experimental viewer. I delay sending RegionHandshakeReply for 2 seconds at startup, so that AgentUpdate arrives first.  For any viewer which sends AgentUpdate unsolicited but waits for RegionHandshake to send {}RegionHandshakeReply{}, that's a race.

Without the delay, my test at Hyperion for large objects not appearing failed 4 out of 10 tries.

With the delay, it failed 0 out of 25 tries.

The sim server doesn't start sending object updates until both RegionHandshakeReply and AgentUpdate have been received. That's correct behavior, because it needs info from both of those messages to decide what to send. But the order in which they are sent does seem to make a difference. It shouldn't.

Since the SL UDP network protocol does not guarantee in-order delivery, it's the receiver's job to cope with out of order messages. So this would appear to be a sim-side  bug. Working around it from the viewer side is possible, but to do it right, the viewer would have to insure that the AgentUpdate message was acknowledged, not just sent, before sending {}RegionHandshakeReply{}. This requires message delivery tracking, which the network level does not currently have.

So, it looks like an out of order receive condition is being incorrectly resolved by the sim servers.

 

sl-service-account commented 1 year ago

Henri Beauchamp commented at 2023-01-25T21:25:59Z, updated at 2023-01-25T21:45:58Z

@animats

The problem is that, when I tried to thread the LLVOCache file reads in my viewer (to prevent the FPS hiccups seen when these reads happen), I also delayed the RegionHandshakeReply reply until the read was completed (so that the cache is primed and the viewer can set the proper flags in that reply, to tell the server whether it actually got a cache for that region or not, which determines whether the server will send cache probes or just anything in the interest list). I then observed a weird second (and sometimes a third) RegionHandshake coming in from the server, even before the first RegionHandshakeReply was sent by the viewer. Also, it looks like the server does not even wait for the first RegionHandshakeReply before sending object data (this caused totally empty neighbour regions in some occasions, in my case, probably because the data for them was sent even before my threaded read would complete, meaning on completion the received data was wiped out).

So, your workaround may work for your Rust viewer since you are queuing messages in it (which is not the case for other viewers, that process them in real time, as they come in), but it will not work for other viewers (the delay in the RegionHandshakeReply seems to confuse the server; maybe it makes the assumption that its first RegionHandshake message got lost and is resending one ?).

Note also that AgentUpdate is a message sent by the viewer in both process_agent_movement_complete() (which is a callback ran as a reply to the AgentMovementComplete message from the server, the latter arriving after a completed TP or login), as a "reliable message", and every second (or when something about the agent changed) from the idle loop of the viewer, as a normal message; on login and far TPs, the AgentUpdate reliable message might contain a very approximate FOV (camera axis) info, which might explain the bogus interest list with "missing" objects in it...

sl-service-account commented 1 year ago

animats commented at 2023-01-25T22:21:51Z

 (the delay in the RegionHandshakeReply seems to confuse the server; maybe it makes the assumption that its first RegionHandshake message got lost and is resending one ?).

So that confirms that the sim server's behavior can be made to change depending on the timing of those messages. That's what I see, too.

Viewer side workarounds for this are just for testing. This has to be fixed sim-server side, so that behavior doesn't change just because one message came in early or late. Otherwise, random network delays will continue to cause intermittent failures.

sl-service-account commented 1 year ago

Beq Janus commented at 2023-01-26T01:09:07Z

@henri

I think I concur with the FOV/culling having some role in this.

I went to a location suggested by @animats today to see if I could reproduce the missing items, and while I could certainly see missing parts, it seemed to me not so much a race condition type effect but poor frustum culling, as you suggested. In particular, a tall building that, on first login/arrival, towers above your vertical FOV loses wall panels; these panels are large prims, possibly mega, but certainly large enough that the centre of the object is offscreen. This led me to wonder whether there was some overly aggressive culling going on. As soon as I cam up a little to get a "clear view" of the missing items, they appear.

sl-service-account commented 1 year ago

animats commented at 2023-01-26T02:31:17Z

I'm getting exactly what Beq describes today. Note the missing wall panels of the large building.

It's not a LOD problem; I can change the LOD factor and it has no effect.

As Beq says, it's as if the center of those large objects has to be within the viewing frustum.  This particular bug is quite consistent.

But, as you can see from the pictures above, there are other fails.  There are cases where you can approach the building closely and still have pieces missing. Tonight, I can't reproduce those.

We may be looking at two different bugs here.

![Screenshot from 2023-01-25 18-07-57.png](Screenshot from 2023-01-25 18-07-57.png)

sl-service-account commented 1 year ago

animats commented at 2023-01-26T02:54:55Z

Now, here's the fail that does NOT seem to be related to the viewing frustum. Here, I can cam around and get the entire area of interest in the viewing frustum and still not get some objects loaded. Those specific building panels are one of the objects that disappear reasonably often.

All these tests are right after a login.

![Screenshot from 2023-01-25 18-39-35.png](Screenshot from 2023-01-25 18-39-35.png)

sl-service-account commented 1 year ago

Henri Beauchamp commented at 2023-01-26T09:00:41Z

@Beq

I went to a location suggested by @animats today to see if I could reproduce the missing items, and while I could certainly see missing parts, .../... As soon as I cam up a little to get a "clear view" of the missing items, they appear.

Beware, you might be seeing "the other bug" here with missing objects being due to a race condition between incoming interest list data from the server and the pipeline objects rebuilding in the viewer; that other bug is worked around, viewer-side, by any of the following actions: camming around the missing object, right-clicking in the "empty place" where they are located, switching wireframe on/off, or using the TPV built-in "Refresh object visibility" feature (initially implemented by Marine Kelley in her RLV viewer, and that I extended/improved when backporting it to my viewer, by adding an auto-trigger feature that kicks in a couple seconds after each login, far TP or sim crossing). If you can see the objects reappear when camming around (not far), then this is not the bug dealt with by this JIRA issue.

The only workaround for the bug in this JIRA issue (short of relogging or TPing far away and back in), is to zoom out far, far away (beyond draw distance), wait a few seconds, and reset the camera to its normal FOV.

This led me to wonder whether there was some overly aggressive culling going on.The culling is done exclusively viewer side, based on the info sent by the server via the interest list. The server does not do any culling of its own for the objects in the latter. So, no, it's not a culling issue, but just that the object is not in the viewer object list, but yes, their absence might be due to the fact their center was not in the approximated FOV initially sent by the viewer in the first AgentUpdate message...

@animats

As Beq says, it's as if the center of those large objects has to be within the viewing frustum.  This particular bug is quite consistent.

But, as you can see from the pictures above, there are other fails.  There are cases where you can approach the building closely and still have pieces missing. Tonight, I can't reproduce those.

We may be looking at two different bugs here.

Yes, there are several bugs that appear to have the same effect for the end user(missing objects), and even the "new" (introduced early in 2022) bug may have a couple causes (race condition and/or FOV issue)...