secondlife / jira-archive

2 stars 0 forks source link

[BUG-227668] 9/25/19 server update caused scripting or asset load issues #5899

Closed sl-service-account closed 7 months ago

sl-service-account commented 4 years ago

What just happened?

After the Tuesday rolling restart, my product AnyPose failed to work on my home region. This pose system has 4000+ animations in it, and many of them were failing to load. After a region restart, everything started working again.

As a test, you can purchase the AnyPose pose stand here: https://marketplace.secondlife.com/p/AnyPose-Basic-Pose-Stand/774338

Rez a pose stand from the HUD, and try moving all joints around using HUD controls. On affected regions, some joints will not move.

Another issue that may present itself is when you sit, instead of standing over the pose stand, you may be offset to the side by about a meter.

You can also use the free demo version, but keep in mind that the Left Arm and Left Leg are intentionally disabled in the demo version: https://marketplace.secondlife.com/p/AnyPose-Demo-Pose-Stand/774337

On a side note, my caspervend dropbox showed a URL error. There may also be issues with scripts obtaining URLs for lsl based web services.

What were you doing when it happened?

Using the Anypose Pose system.

What were you expecting to happen instead?

For animations to play from my scripted object, and for the avatar to be centered over the pose stand.

Other information

This appears to point to either scripts shutting down, or a problem with the regions contacting the asset servers. I have had many complaints from customers about the same issue. I am asking them to restart their regions, or ask their land owners to do so.

Regions affected: Kilauea - Restart resolved the issue. (Happens again each time after the region has been empty for a while.) OSMIA - Restart resolved the issue. Hidden Falls - Broken (Land owner has not restarted it yet.)

As noted below: Recompiling scripts sometimes fixes the issue. What is really odd is that recompiling the scripts in one pose stand that is rezzed causes the others to work too.... like the act of iterating through the inventory of the object is caching stuff.

Then, if I log out for a while, and log back in, it is broken again.

Links

Related

Original Jira Fields | Field | Value | | ------------- | ------------- | | Issue | BUG-227668 | | Summary | 9/25/19 server update caused scripting or asset load issues | | Type | Bug | | Priority | Unset | | Status | Closed | | Resolution | Cannot Reproduce | | Reporter | Phate Shepherd (phate.shepherd) | | Created at | 2019-09-26T13:54:22Z | | Updated at | 2019-09-30T20:38:46Z | ``` { 'Build Id': 'unset', 'Business Unit': ['Platform'], 'Date of First Response': '2019-09-26T11:41:14.208-0500', "Is there anything you'd like to add?": 'This appears to point to an issue with the regions contacting the asset servers. I have had many complaints from customers about the same issue. I am asking them to restart their regions, or ask their land owners to do so.', 'ReOpened Count': 0.0, 'Severity': 'Unset', 'System': 'SL Simulator', 'Target Viewer Version': 'viewer-development', 'What just happened?': 'After the Tuesday rolling restart, my product AnyPose failed to work on my home region. This pose system has 4000+ animations in it, and many of them were failing to load. After a region restart, everything started working again.', 'What were you doing when it happened?': 'Using my pose system that has many animations in it.', 'What were you expecting to happen instead?': 'For animations to load', } ```
sl-service-account commented 4 years ago

yt.recreant commented at 2019-09-26T16:41:14Z

this is a duplicate of https://jira.secondlife.com/browse/BUG-227669

sl-service-account commented 4 years ago

Phate Shepherd commented at 2019-09-26T16:52:12Z

I may have jumped the gun on claiming this was an asset server issue. It could be a scripting issue that causes scripts to simply stop working at random.

sl-service-account commented 4 years ago

Phate Shepherd commented at 2019-09-26T17:26:30Z

Also note that the Kittycats group put out a statement that their product has gone haywire with this update.

sl-service-account commented 4 years ago

Phate Shepherd commented at 2019-09-26T21:18:22Z

Happening at pretty regular intervals. Still trying to nail down what gets it working again short of region restarts. Script reset didn't work immediately, although it did start working shortly after.

The really odd bit is, my pose stands are rezzed from a HUD. Once a rezzed stand was "fixed", following stands rezzed worked.

sl-service-account commented 4 years ago

NeoBokrug Elytis commented at 2019-09-26T22:43:36Z, updated at 2019-09-26T22:44:17Z

This affects the main channel roll of Second Life Server 2019-09-13T20:04:44.530946

I have been doing as much research on this issue as I can, and I've whittled it down to what I believe one of three things are happening:

The issues are intermittent. I believe that scripts still work for affected objects, because my error reporting still yells at me.

Objects that are being rezzed are having issues in general. Sometimes they don't rez, sometimes they do.

If objects rez, for me llKey2Name() fails to resolve object names after a fresh object rez when there are problems, this means it's taken longer than 5 seconds from when the the object_rez() event is triggered.  The UUID is passed to the object_rez() event.  I am currently researching if llGetObjectDetails(id, [OBJECT_NAME]) is affected.

llRegionSayTo() messages might also be intermittently dropped.

I believe based on my observations, that the core of the issue is the objects are not fully "on the region" after they're rezzed.  A UUID may be assigned, but either the object doesn't rez, or is too slow to rez.

Region restarts do not resolve the issue, it persists.

 

sl-service-account commented 4 years ago

Phate Shepherd commented at 2019-09-27T00:10:36Z

Some further notes from my tinkering:

Resetting scripts doesn't work. As others have found, recompiling scripts does appear to work (at least in my short testing.) Of course customers can't do that.

Now here is the really weird bit: If I recompile the scripts in one pose stand, others that are rezzed start working. The one that was recompiled, and the ones that weren't.

Also, of note: If I log out and wait a while, when i log back in, it is likely to be broken again.

sl-service-account commented 4 years ago

Callie Cline commented at 2019-09-27T00:36:42Z

Our cats are having odd issues including overhead stats not showing, cats not rezzing correctly,, and cats not eating, cats showing hungry when not, etc. I

It’s not grid wide, it’s on certain regions but has been a support nightmare.

Last week vendors stopped working.

All year each time they change something we have big problems.

sl-service-account commented 4 years ago

Ray Silent commented at 2019-09-27T02:06:52Z, updated at 2019-09-27T02:28:07Z

Just my 2 cents, http://wiki.secondlife.com/wiki/LlGetScriptState seems not really reporting that failing to rez / run scripts are stuck. In my objects I have a designated script that runs through the content to make sure that all the scripts are still running. The same script also pings them to see if they respond via local messages within a reasonable amount of time and attempts to restart those only if their llGetScriptState returns FALSE. So, when a rezzed / attached object goes into this "limbo" state, I notice 2-3 scripts not only responding to local messaging pings, but llGetScriptState still thinks that the scripts are running and functional.

Region where this happens: Aeros Island. Starts happening in about 6-8 hours after rebooting. Changing the host and resetting back to version Second Life Server 2019-09-13T20:04:44.530946 hasn't helped.

sl-service-account commented 4 years ago

Phate Shepherd commented at 2019-09-27T02:11:24Z

Ray, that sounds consistent with my tests. I have several debug commands in my system. One of them forcibly restarts all scripts and sets them to running, however, the scripts still don't behave as they should.

I would be curious... if you have two identical objects on the region, and both are not behaving properly, do they both start working if you use the build->scripts menu to recompile all scripts to mono on just one of them. That is what really confused me... how recompiling scripts in one object fixed the others.

sl-service-account commented 4 years ago

Ray Silent commented at 2019-09-27T02:19:00Z, updated at 2019-09-27T02:23:08Z

I'll check that @Phate Shepherd.

I also forgot to mention that the problem actually started about 15th of September if not earlier.

sl-service-account commented 4 years ago

NeoBokrug Elytis commented at 2019-09-27T06:11:29Z

I can confirm that recompiling scripts does not fix the issue.

sl-service-account commented 4 years ago

Ayame Musashi commented at 2019-09-27T10:02:32Z

TL;DR: I believe the latest server update is causing serious script lag.

I am seeing similar issues with my Katana. I make and sell scripted weapons that people fight each other with, and when the weapon is first attached, it starts a process of verifying permissions and various security checks involving link message communication, object parameter reading, etc. to verify the authenticity of both the scripts and the objects that make up the weapons linkset. If these checks fail, then the weapon goes into a dead state and does not function. if it passes, the weapon goes into a wait state and waits to be "unsheathed" as it is transparent up until that point.

My customers are seeing an issue with the weapons, that once they attach, and "unsheath" the weapon, it simply does not work. When it is unsheathed, one script handles turning all of the prims visible, excluding certain prims that the user has the ability to configure to not be seen (settings menu to toggle parts of the weapon on/off, like removing the hand guard). This is functioning properly. The weapon appears. But the weapon is dead, and does not function, this because the main script is still running the security check presumably. I have yet to start adding debugging through all of the scripts to see where it gets to before it stops.

I have had varying success by waiting an extra long time after attaching and before unsheathing, giving the scripts time to complete the security checks that normally complete in less than a second or two. It has worked for me almost 100% consistently, I had one failure but it could have been false negative (my fault).

But of course my customers do not know this, so they all attach the weapons, and draw them right away. then when it does not work, try taking it off and putting it back on, and drawing immediately again. The majority have been using these weapons for years and know from experience that they normally work.

So what ever you did to script processing, please undo it. :) Thank you!

 

sl-service-account commented 4 years ago

Ayame Musashi commented at 2019-09-27T10:06:57Z

@[~phate.shepherd] Isn't that how Mono works? I could swear the whole idea of Mono was that multiple items using a mono script, shared the one script in memory instead of each object having it's own instance. So recompiling one copy of it in the sim, would recompile the working copy of it for all active objects using that script. I could be wrong... but that would make perfect sense to me, since that is my understanding of the Mono implementation.

sl-service-account commented 4 years ago

sabina.gully commented at 2019-09-27T10:55:31Z

HUD to object communication broke for me, but only on a random selection of scripts (can't figure out why just some). The issue started happening after the rolling restarts the other day, on the region "Magika". The communication starts working as soon as you teleport anywhere else, but stop working as soon as you teleport back. The issue is fixed for a short period after restarted the region, but the breaking reoccurs on random scripts only 1-2 hours later.

What worked for me: The issue was "resolved" by forcing the object script to reset on attach. But I know this isn't an option for everyone as some scripted items needs to remember configurations – not to mention that it might not even resolve the issue for everyone.

sl-service-account commented 4 years ago

Phate Shepherd commented at 2019-09-27T11:37:30Z

@Ayame Musashi I knew about the shared scripting of mono, but I had thought that only worked if it was the same script copied from inventory to object, and that recompiling broke that link. I guess there is no reason they couldn't do an MD5 on the compiled code and if it matches an existing compile, just share it. Hadn't thought of that.

sl-service-account commented 4 years ago

Whirly Fizzle commented at 2019-09-27T15:39:21Z

Grid status post: https://status.secondlifegrid.net/incidents/263n6z6w1sp0

sl-service-account commented 4 years ago

NeoBokrug Elytis commented at 2019-09-27T16:01:48Z, updated at 2019-09-27T16:13:13Z

I did notice that the "Scripts Run" stat was much better before the roll, when it had initially became a problem months ago. After the roll it's doing bad again.

I believe the core of the issue has to do with object to object communications. Possibly the listen event is broken.

sl-service-account commented 4 years ago

Phate Shepherd commented at 2019-09-27T16:28:27Z

In one fairly repeatable case, I can recompile one script in my object, and then all the rest behave. That one script is responsible for positioning the avatar. It only listens to link messages, no llListen's. That same script does have an llResetScript() on rez, and that doesn't help it.

 

Every time I think I've nailed down the cause, something else negates it. It could be something much more core to the script engine... something that effects a broad range of things... coms, URL requests, script run state (or initialization), who knows.

sl-service-account commented 4 years ago

Aishagain commented at 2019-09-27T16:57:19Z

Location, Woods of Heaven homestead, running the current Main Server  software set.  Time 6:44hr PDT today, Friday Sept 27th, item: HUD for A&Y Cyber pants.  The HUD rezzed on my screen and initialized as normal.  It connected to the worn pants as normal.  After one colour change which was effected on the Pants (worn), the HUD became unresponsive to ANY touch (it does not use local chat commands).  I "took" the HUD back into my inventory and re-rezzed it.  Again it rezzed, initialized and connected to the pants as normal, but after one colour change (effected by two touch events) it became unresponsive.  The client is the current version of Firestorm.

sl-service-account commented 4 years ago

Phate Shepherd commented at 2019-09-27T20:31:52Z, updated at 2019-09-27T20:50:05Z

They just updated the Grid Status post. I am VERY skeptical this is the cause of all the issues!

My pose stands do not rely on object_rez at all.

Don't stop looking, there is more to it than that!

sl-service-account commented 4 years ago

Ray Silent commented at 2019-09-27T21:09:37Z

If anyone listens, it has nothing to do with object_rez or on_rez. We are reporting about a wider range of scripts. Mine don't rez anything and still in a limbo state.

sl-service-account commented 4 years ago

Phate Shepherd commented at 2019-09-27T21:16:51Z

The biggest and most obvious thing to be looking at:

Why does stuff start breaking if nobody is on the region for a while?

sl-service-account commented 4 years ago

Phate Shepherd commented at 2019-09-29T04:47:07Z

Since the rollback, I have not encountered issues. Will the regions on Aditi with future fixes to test be posted?

sl-service-account commented 4 years ago

Mazidox Linden commented at 2019-09-30T20:38:47Z

Hi there Phate,

As you indicated there haven't been issues since the roll, I'm going ahead and closing this issue. Please keep your eye on the Second Life Server Technology forum located at https://community.secondlife.com/forums/forum/310-second-life-server/ for details regarding upcoming releases and opportunities to test, and if you're so inclined feel free to join our weekly Server Beta User Group meeting (the details of which can be found at http://wiki.secondlife.com/wiki/Server_Beta_User_Group)