secondlife / jira-archive

2 stars 0 forks source link

[BUG-229617] AWS Region issues - Spare time = 0 / - Script time - degraded performances #7446

Closed sl-service-account closed 8 months ago

sl-service-account commented 3 years ago

What just happened?

Since my regions have been migrated to AWS, performances have dramatically collapsed.

Mine are gaming regions and need a lot of scripts but they are in idle status. There is no way to reduce them and these problems are also affecting many of my competitors regions. Up until 2 years ago we were able to use up to 9000 scripts per region. Lately, even before the AWS uplift we had to reduce the number of scripts to 5000. In old LL servers, with this amount of scripts, the "Spare Time" and "Script time" values ​​were acceptable even if not fantastic (certainly worse two years ago)

Since my regions are on AWS I have the following issues:

I found other similar Jiras

https://jira.secondlife.com/browse/BUG-229566 Teleport - Script - sim crossing issues

https://jira.secondlife.com/browse/BUG-229611 Idle scripts use more script timing on AWS regions than non-AWS.

https://jira.secondlife.com/browse/BUG-229543 Inventory creation on in-world object failed (AWS)

https://jira.secondlife.com/browse/BUG-229468 Legacy profiles very slow to load AWS https://jira.secondlife.com/browse/BUG-229210 Region corner crossings fail repeatably on Aditi AWS regions.

What were you doing when it happened?

Managing my regions. Listening complaints from players, Editing prims, saving a notecard.

What were you expecting to happen instead?

Have better(normal) sim performances , especially considering that Gaming regions are more expensive

Other information

Gaming regions are more expensive then normal regions

Attachments

Links

Related

Original Jira Fields | Field | Value | | ------------- | ------------- | | Issue | BUG-229617 | | Summary | AWS Region issues - Spare time = 0 / - Script time - degraded performances | | Type | Bug | | Priority | Unset | | Status | Closed | | Resolution | Duplicate | | Reporter | GaiaGabe SLSGO (gaiagabe.slsgo) | | Created at | 2020-11-05T10:20:49Z | | Updated at | 2020-11-11T18:31:33Z | ``` { 'Build Id': 'unset', 'Business Unit': ['Platform'], 'Date of First Response': '2020-11-05T04:59:05.238-0600', "Is there anything you'd like to add?": 'Gaming regions are more expensive then normal regions', 'ReOpened Count': 0.0, 'Severity': 'Unset', 'System': 'SL Simulator', 'Target Viewer Version': 'viewer-development', 'What just happened?': 'Since my regions have been migrated to AWS, performances have dramatically collapsed.\r\n\r\nMine are gaming regions and need a lot of scripts but they are in idle status. There is no way to reduce them and these problems are also affecting many of my competitors regions.\r\nUp until 2 years ago we were able to use up to 9000 scripts per region. Lately, even before the AWS uplift we had to reduce the number of scripts to 5000. In old LL servers, with this amount of scripts, the "Spare Time" and "Script time" values \u200b\u200bwere acceptable even if not fantastic (certainly worse two years ago)\r\n\r\nSince my regions are on AWS I have the following issues:\r\n\r\n- "Spare time" is constantly close to zero.\r\n- Idle scripts use more script timing on AWS regions than non-AWS.\r\n_I get script errors messages randomly from objects rezzed/used from ages\r\n- Http issues\r\n- Issues to edit / save a simple script\r\n- Avatars crash on sim crossing (flying and walking)\r\n- Inventory creation on in-world object failed\r\n- Issues to save a simple notecard\r\n- Search- Profiles load very slowly\r\n-Teleport issues\r\n\r\nI found other similar Jiras\r\n\r\nhttps://jira.secondlife.com/browse/BUG-229566 Teleport - Script - sim crossing issues\r\n\r\nhttps://jira.secondlife.com/browse/BUG-229611 Idle scripts use more script timing on AWS regions than non-AWS.\r\n\r\nhttps://jira.secondlife.com/browse/BUG-229543 Inventory creation on in-world object failed (AWS)\r\n\r\nhttps://jira.secondlife.com/browse/BUG-229468 Legacy profiles very slow to load AWS\r\nhttps://jira.secondlife.com/browse/BUG-229210 Region corner crossings fail repeatably on Aditi AWS regions.\r\n', 'What were you doing when it happened?': 'Managing my regions. Listening complaints from players, Editing prims, saving a notecard.', 'What were you expecting to happen instead?': 'Have better(normal) sim performances , especially considering that Gaming regions are more expensive', 'Where': 'http://maps.secondlife.com/secondlife/GAIA%20GAMES%20II/133/131/20\r\nhttp://maps.secondlife.com/secondlife/GAIA%20GAMES/133/131/21\r\nhttp://maps.secondlife.com/secondlife/GAIA%20GAMES%20III/124/129/19\r\nhttp://maps.secondlife.com/secondlife/GAIA%20GAMES%20IIII/123/131/19\r\nhttp://maps.secondlife.com/secondlife/Selena/128/128/19\r\nhttp://maps.secondlife.com/secondlife/SAM/125/133/21\r\nhttp://maps.secondlife.com/secondlife/PHANTASIA/144/159/21\r\nEnarah\r\nEnarahGames\r\nEnarah Games Region\r\nEnarah Skill Gaming\r\nEnarah Games Twin\r\nRainbow Games\r\nZen Games\r\nFour Beautiful Seasons', } ```
sl-service-account commented 3 years ago

LUISELLA Frog commented at 2020-11-05T10:59:05Z

Yes agreee

sl-service-account commented 3 years ago

SamsonieAuster SLSGO commented at 2020-11-05T11:52:41Z

It affect not special Sims, the Performance is simply lower on AWS as before on LL Servers. Like on Homestead Sim where scripts use more scripttime as on normal sims. I not changed something on my sim since long time. Its not completely build and i had a lot of space of 4.000-5.000 spare time before on LL servers. After moving to AWS i have now 0 spare time. See my screenshot, i use 4255 scripts and i had spare time of 4.000-5.000ms before, waiting to continue to build more things on sim. Now there is near 0 spare time with 1 avatar (me). When only 2 or more avatars are coming it will start to throttle and scripts will run slower then. All the spare time is gone without changing something on sim. You give us lower performance sims/servers now as before. 

sl-service-account commented 3 years ago

Maestro Linden commented at 2020-11-05T20:37:02Z

Hi GaiaGabe, could you please attach an example 'idle script' that has higher script time on AWS than a non-AWS region? With this information, we can run an A/B test between different region versions, either by looking at something like OBJECT_SCRIPT_TIME or by having many copies of the script in an otherwise-empty region and looking at the overall script stats.

Also, I would caution against looking at 'Spare Time' as a primary metric when assessing script performance - that stat is only indirectly related. When a simulator executes a frame, it first processes physics, viewer communications, etc., and essentially only runs the scripts in the time leftover in the 22.2ms frame. The simulator attempts to run every script every frame in a round-robin fashion, but will defer a portion of the scripts until the next frame if it runs out of time. The average proportion of scripts that is run every frame is reported in the 'Scripts Run' stat - this is the stat you should focus on when assessing script performance. When 'Scripts Run' is less than 100%, script performance is less than ideal.

In your screenshot, 'Scripts Run' is only 93.4% - this indicates that the simulator ran scripts in the full time slot allowed, and had to stop short in order to keep total frame time at 22ms. With 'Scripts Run' less than 100%, it is expected for 'Spare Time' to be approach to 0, since the sim is completely busy every frame - otherwise, that would mean that the sim still had scripts scheduled to run, but opted to sit idly rather than execute them.

sl-service-account commented 3 years ago

GaiaGabe SLSGO commented at 2020-11-05T21:34:35Z

Hi Maestro. The pic is from Samsonieauster SLSGO, not from me.

I Unfortunately I have no pics of the period prior to the liftup. But I can tell you that the spare time in that period was over 2.00, constantly. The value could vary depending on the number of players in my regions. From 2017 to 2019 I was able to manage about 9000 scripts per region, but in the last few months (even before the liftup), although nothing had been changed on my part, I had to reduce the number of scripts. I bought another region, I redistributed the number of scripts between the 4 regions (up from 3) and everything ran perfectly (or almost). As soon as my regions migrated to AWS, the problems started. If you look at my support history you will see that I had to ask for a rollback to the old servers (this temporarily solved my problems, which however were so serious that I had to close all regions for 36 hours).

 

In the past , with  'Scripts Run' less than 100%, and around 95% all was fine and spare time was around 2.00

 

I am not an engineer, much less a scripter. A Gaming region is full of games with inactive scripts. These scripts in theory only become active when a player pays for the game and starts the game.I am not an engineer, much less a scripter. A Gaming region is full of games with inactive scripts. These scripts in theory only become active when a player pays for the game and starts the game. But I think there is a bug already in the old servers, which has been pushed to the nth degree on AWS. Please read this thread https://community.secondlife.com/forums/topic/437870-it-really-is-the-number-of-idle-scripts-that-drags-down-a-sim/ .

 

I would also point out your attention to this bug https://jira.secondlife.com/browse/BUG-229622

 

 

 

sl-service-account commented 3 years ago

Maestro Linden commented at 2020-11-05T22:49:42Z

In the past , with 'Scripts Run' less than 100%, and around 95% all was fine and spare time was around 2.00 That past behavior seems like it would have been a bug - the sim should have run more scripts if it really had that much time for idling.

But again, the only stats that really show script performance are 'Scripts Run' (with 100% being ideal) and 'Script Events' (which shows how fast scripts are being run overall). If you think that script performance has dropped, we can investigate by copying your affected region(s) to beta grid and see how these stats differ when moving the region back and forth between AWS and non-AWS environments.

sl-service-account commented 3 years ago

GaiaGabe SLSGO commented at 2020-11-05T23:18:43Z

yes please :) 

sl-service-account commented 3 years ago

GaiaGabe SLSGO commented at 2020-11-06T00:46:39Z

Hi Maestro. I just noticed that Gaia games is again on old server and watch my spare time now : https://gyazo.com/cf4a1fcf7a90e11f47154cfb80c24afe

sl-service-account commented 3 years ago

Maestro Linden commented at 2020-11-06T01:55:36Z

Alright, it looks like we don't have to go to the beta grid for an A/B test after all, since GAIA GAMES just moved back to the non-AWS 548903 build.

I would again encourage you to look at the 'Scripts Run' and 'Script Events' stats to analyze script performance. Here is a screenshot of it running on 548903: ![Gaia Games on build 548903 (2020-11-05 at 17.43.48).png](Gaia Games on build 548903 (2020-11-05 at 17.43.48).png) 'Scripts Run' is only 87.7%, and 'Script Events' is at 376 eps. There's a bit (+-10% between each sample maybe?) of jitter these stats, so it's hard to compare to the single sample shown for AWS, but it looks as though the script performance is quite similar between AWS in your original screenshot and the noon-AWS 548903 that's running now.

sl-service-account commented 3 years ago

GaiaGabe SLSGO commented at 2020-11-06T07:00:27Z

In the run-up to the migration, I monitored the Statistic Bar for months. ""Script time" has always been on values ​​well above 90%, even when there are more than 20 avatars in region.

In the same period I obviously kept under control also the "SpareTime" and the values ​​were higher than zero, as is the case when the sim is on AWS.

As far as I understood, "Spare time" is the unused CPU time available for scripts and other tasks. Once all the time is used up scripts wait for their time in the CPU. If my region has no time to run scripts, things get sluggish.

I therefore disagree when you say that the 2 different servers offer the same performances. As proof of this I have problems with scripts that I have been using for years, I am likely to do simple things like build, save a notecard, teleport ... all the things I have described when opening this Jira.

I think it is obvious that a large number of scripts are needed to manage a Gaming sim. It has always been like that. These are mostly Idle scripts that are triggered only when a player plays a game. On the other hand we know these scripts aren't really idle .. they can drag down sim performance even if they are not actually doing anything.

Right now llListen () is a major culprit. If we add to this fact the "SpareTime" to zero, it will be really difficult to manage a gaming sim.

sl-service-account commented 3 years ago

SamsonieAuster SLSGO commented at 2020-11-06T08:34:11Z

Only for Information i added a screenshot to compare, my Regions went back to old agni.lindenlab server too. And now you see a very good spare time of 6,500 and a lower script usage, thats really perfect.. When you compare with my old screenshot with AWS. 

 

sl-service-account commented 3 years ago

SamsonieAuster SLSGO commented at 2020-11-06T08:42:10Z

Script Run is again 100%. Thats clear because there is a lot spare time back now and not all used because the scripts use now a lower Script Time. Seems the Script Run shows the same as my inworld device which calculates the script run performance and send me to email from time to time that i have a good statistic in email too.

sl-service-account commented 3 years ago

SamsonieAuster SLSGO commented at 2020-11-06T08:53:15Z

You have now a good compare because i not changed something, the only difference is only that now on the old agni.lindenlab server is still another avatar on sim with 5 more scripts. Script time on AWS was: over 20ms. Script Time, on agni.server 14.158ms. Result is a lot more spare time and Script run is back to 100% from 93% at AWS. Hold in mind that my space time is reserved for avatars and in future maybe rezzing more games. Other places like gaia using mostly more script time because they use more games/scripts too. So we will need that full performance like it is on agni.lindenlab servers before.

sl-service-account commented 3 years ago

Rider Linden commented at 2020-11-06T18:53:39Z

https://community.secondlife.com/forums/topic/458241-what-is-sleep-time/?do=findComment&comment=2156846

The numbers you are interested in watching when evaluating script performance are % of scripts run and EPS ("Events per second").  Script time is occasionally interesting trivia (once that reading starts approaching 20ms you will start seeing the % of scripts executed per frame drop.)  The simulator will never execute a script more than once a frame, so 100% is the best you can get. 

Using spare time to evaluate script performance is almost meaningless unless you also take into account agent time, network time, physics, and simulation times. Spare time is just the mean value over the sampling spread that the simulator had not used performing these other tasks. 

Script lag can be observed when scripts take an unexpectedly long time to respond. Script performance and overloading will have nothing to do with things like teleports, note cards loading, or other inventory operations.  These are accounted for under the simulation, network, or agent headings and depending on the operation may be divided up among these other categories. 

sl-service-account commented 3 years ago

GaiaGabe SLSGO commented at 2020-11-06T22:24:56Z, updated at 2020-11-06T22:25:23Z

Dear LL, talking to you is equal to climbing Everest .. especially if you are an old milf housewife (cougar? ... allow me a little irony) like me and not a computer engineer. I anxiously await your responses which promptly arrived during my night hours when I am unable to respond at best. I only know one thing, and I have been monitoring MY REGIONS SINCE YOU RETURNED THEM to the old servers ... right now, for the last 24 hours, the spare time on the old server is not stuck at zero. As soon as you move me back to AWS I will have the same problem again (spare time = 0) and a whole host of other problems This is happening on all gaming sims. I believe that you also have some doubts, because you have almost all put them back on the old servers in the last 24 hours. I would have preferred you to tell me that spare time is counted in a different way on AWS. it would have been more credible. I would have preferred you to tell me that these are temporary issues that will be fixed once the aws migration is complete I am sincerely discouraged ... at this point what can I do? To offer my players a nice gaming experience I would have to buy (at least) 2 more sims ... for them it would be like playing games in a desert ... most players wouldn't even see the games because their draw distance is rarely set to more than 128 meters ... they wouldn't even see the teleport boards ... let's not talk about costs. The problem exists and you are denying the evidence. I can also see this from how you are responding to the bug https://jira.secondlife.com/browse/BUG-229622 (for which I had to close all my regions for 36 hours and the only solution was that to ask for my regions to be returned to the old servers ... ask Corky and Rocko Linden for more information). I was forced to close because due to the bug mentioned: my skill games no longer communicated with contest boards, replays, game creators' servers and my own server.As Raziel clearly explained, restarting the sim fixes the problem temporarily, but after a few hours the problem returns. I emphasize that we have been using these scripted devices for more than 10 years without any problems Many of the other SLsgo are having the same problems but are unable to comment on the Jira and ask me to fight their battle

sl-service-account commented 3 years ago

ZenGames SLSGO commented at 2020-11-06T23:04:13Z

It seems it is indeed the new AWS servers that are somehow the issue.  Some of my regions including Enarah Games have been moved back to the old linden lab servers and I have not had a problem in 24 hours.  Interestingly enough spare time there is still showing as 0 but I'm scared to restart the region for now things are working and I'm worried a restart would put me back on AWS servers.

sl-service-account commented 3 years ago

JIRAUSER333045 commented at 2020-11-07T19:25:43Z, updated at 2020-11-07T19:26:59Z

Hello Rider Linden,

 

My skill gaming region "Miami Games" has also been moved back and forth to AWS as a result of another severe issue that apparently is fixed in the current AWS server version set on my region - 2020-11-05.551765  

 

As a result of that I spent a lot of time looking at my own region stats data the past 30 days, before and after the uplift.

 

Since my region is at "half capacity", I don't suffer from "Scripts Run" < 100% and there's 'plenty' of spare time compared to almost any other skill gaming region.

 

I can surely say that after being uplifted, the "Script Time" average is up. Maybe +15/+20%.

 

This results in lesser spare time ofc and for any region that is not at 'half capacity' like mine, it means their "Script Time" jumped like 15/20%.

 

I have several recent snapshots PRE uplift stats that show my region with 2433 scripts and Script Time averaging 10.7ms to 13.7ms, despite being messed up by another issue.

After uplift, with 2746 scripts now, Script Time is averaging from 13.7ms to 16.7ms. I immediately noticed it but don't wish to be moved back again please! The other issue was more severe in my situation ;)

 

 

 

sl-service-account commented 3 years ago

SamsonieAuster SLSGO commented at 2020-11-08T16:15:52Z

I was moved back to AWS again, but i still could not check something because 6-10 Avatars still on Sim. I can only compare to before when there is only me on sim, so i still need wait. (Avatars eat all the spare time)

sl-service-account commented 3 years ago

SamsonieAuster SLSGO commented at 2020-11-08T22:41:15Z

Seems its all again like before on AWS. Scripts use again 20ms not 14ms and so my 5-6ms spare time is gone too. All again slower when i would rezz more things:( (remark i not have any problems, but you see on my sim empty fields and prepared spaces for another game.. for that the 6ms spare time, that is gone now, was reserved) Other places that use more games actually will notice a slowdown maybe then.

sl-service-account commented 3 years ago

GaiaGabe SLSGO commented at 2020-11-10T17:43:46Z

I am still waiting a reply . I am happy to be back on old server , as you can immagine by the pic of the statistic bar I am enclosing today. As you can see on old server Spare time is not bad , as well as Scripts Run % , also considering that sim was not restarted for 112 hours.

![GaiaGames Statistic bar November 10th.jpg](GaiaGames Statistic bar November 10th.jpg)

sl-service-account commented 3 years ago

Oz Linden commented at 2020-11-10T19:46:33Z

Spare time becoming smaller is not, by itself, and indication of a problem; that just indicates that the sum of the times used by each of the simulator phases did add up to the total allowed for a frame.

The Scripts Run percentage is a better indicator - and in all of these it's quite high (which is good).

One possible source of script performance issues is that in uplifted regions the time to access Experience data is longer from the cloud. Todays roll has a change that allows us to improve that, and we will apply that configuration change soon, so you may see the performance of anything that uses an Experience improve (and get worse in the remaining colo hosts, but they will move soon and get the improvement).

sl-service-account commented 3 years ago

GamesDeluxe SLSGO commented at 2020-11-11T15:41:53Z, updated at 2020-11-11T16:50:48Z

My skill gaming region "Deluxe Games II" was recently uplifted and immediately there was a drop in script performance,

 

I constantly maintain and monitor this region for over 8 years now and it's clear that Script Time's are about 15% higher than they used to be.

It used to average at 17/18 ms and now it's averaging above 20ms constantly, consuming almost every extra spare time and dropping it's Script Run performance.

 Also, no Experiences at all being used.

sl-service-account commented 3 years ago

GamesDeluxe SLSGO commented at 2020-11-11T17:25:04Z

Simple "test" trying to reproduce the higher script time in AWS servers.

Go to AWS region:

Go to 'Agni' region:

Result: aws region "Deluxe Games II", agni region "The Secret"

https://gyazo.com/0437263992035df3a49994aae4ab8f3e

Conclusion:

Average Script Time from default "Hello, Avatar" script  is much higher in AWS servers.

 

I hope this helps "prove" something and we can move on from trying to understand if there's a problem to aknowledging that there's an actual serious performance problem and it's time to work on a fix.