rollerderby / scoreboard

CRG Derby Scoreboard
Other
136 stars 55 forks source link

[BUG REPORT] - CRG Lagging after multiple games #671

Closed bullseye555 closed 8 months ago

bullseye555 commented 11 months ago

@JeneralPain and I have experienced this at a recent tournament we ran (Jen was Tech and I was THNSO) and there is more reports of issues coming through on the FB group. Unfortunately, the issues at our tournament were at points in time where neither of us were able to get full screengrabs etc

For the sake of this report, I'll consolidate all reports here with as much details as has been given from the FB Group [these are just the two that have been posted today, 14/10/2023 AEDT]

Devon (Report 1):

We are currently running a large tournament with multiple scoreboard computers and while they have decent specs we are hearing from the operators that after a few bouts they start to become glitchy or unresponsive. It would be preferable that we didn't have to restart CRG Scoreboard or clear out data every couple of bouts to keep the program working smoothly. So my question is are there any guidelines for clearing cache or possibly increasing the Java heap size to allow for the program to use more memory? My guess is this may be some sort of memory leak, but since these large tournaments don't happen that often we usually don't see the issue when running normal home events that are 2 bouts max. Thanks in advance for any ideas or suggestions on what to try!

Comment from Spike Kwann:

more information about the versions of crg, java, os as well as other connections to your crg (ePLT, announcers , THOs, etc) would be helpful to troubleshoot as well as the types of glitches occurring. other tournaments suspected issues with the number of games set up in crg as well but that hasn’t been verified as an issue, so number of games per setup would also be interesting to know wi-fi / network infrastructure info would be helpful if there are multiple device

Update from Devon:

Ok so here is the additional information that was requested: CRG Version: 2023.4 Java Version: Java 8 Update 381 OS: Windows 10 Home 22H2 Other connections: Livestream computer connects to scoreboard computer for the overlay and that is it. Network: For the tournament the scoreboard computers are connected wirelessly to a private network and the live stream machine is hard wired into said network. I have seen this behavior though when everything has been hardwired into the same network. Some additional info: 8GB RAM Core i7-6500U at 2.50 GHz Number of bouts programmed into CRG: 15 Number of bouts actually ran: 6 Number of teams programmed into CRG: 25 Number of operators: 3

Gemma (Report 2):

At a tournament at the weekend, we reached Game 5 Period 1 Jam 11 when the jam started normally but then went back to Jam 10. This wiped Jam 11 from the ePLT and at the end of the jam the scoreboard went to saying P1 was upcoming. We tried to manually correct Jam 11, but then Jam 12 did exactly the same thing. At this point we completely restarted the Scoreboard and abandoned ePLT. I have been at another tournament where this happened, and our current theory is that there is a cap on how big the games files can be. It appears this was the point where there was no more space, so the scoreboard just failed. All previous games were removed and Game 6 appeared to run fine (although ePLT wasn't used for it). This was using the latest 2023 release and on a Windows laptop. However, as this is definitely the second tournament I've seen the issue at, plus another one that I suspect it was a contributing factor and the recent post by Devon Chopp, I don't think hardware is the problem here.

Comment from Laura Carr:

we actually had a similar situation at our last home game. We didn't recreate it at the time or pin down what we think might have contributed to it. But it sounds like what you describe here i should add, we also were using ePLT. Is it only happening when ePLT is being used?

Bullseye & Jeneral Pain

East Coast Clash

Running a central server (Dell Poweredge R320, Xeon E5-1410 (2.8GHz Quad-Core), 24GB DDR3, 2x 500GB) with 2x Scoreboard Instanced [running from different folders, one on port 8001 for Track 1, and one on port 8002 for Track 2] running a headless LINUX [@JeneralPain - can you please confirm distro & version etc] with ePLT on both tracks for all games, and a laptop as wireless remote-head for the scoreboard operators. Streaming setup was connected via ethernet to the same server with streaming run from a different device. This event ran 2023.3 as 2023.4 hadn't yet been released. We saw the following behaviour as the weekend progressed [there were 21 games in total, 10 on T2, and 11 on T1 - all issues were seen on both tracks, but not consistently].

  1. The Pick a pre-loaded game dropdown did not populate with any games, despite there being several games still to be played that were loaded. Clearing local device cache and trying to load the Operator Panel and start the game from different devices was not consistently successful. In all instances of this happening except for 1, the issue self resolved after a minute or two. The other required the Java instance on the server to be closed & restarted
  2. At the end of several games, the game did not change to Unofficial Final straight after the end of the Jam if the Period Clock expired during the last Jam [I attempted to re-create this, but was unable to - this tournament is not the first time I've seen it happen]
  3. Related to point 2, in almost all cases, the SBO panel appeared to have no impact on any of the clocks/displays. Attempting to manually stop the Period clock did nothing [as the clock was at 0, I'm not surprised by this] but also attempting to use the "(un)Official Final" button to force it to end the game also yielded no result

Trop Cup

As an additional note, there was a Tournament Jen ran tech for 2 weeks ago on a headless desktop PC [different server, but otherwise same config as the above event] (@JeneralPain - can you please confirm specs of the PC used for Trop?) where when we went to use it an another tournament the following weekend, point 1 above presented the same issue on both 2023.3 and 2023.4 [with game data from the previous weekend that was run on 2023.3 imported automatically when 2023.4 was first started] We ended up using a Linux-run laptop for that tournament and it seemed to run fine for the first day (8 games, ePLT, but they were all shorter games, so I suspect smaller files due to less lineups etc). To avoid any issues on day two, I started a fresh version of 2023.4 in a new folder and first-run used the No Import option so it didn't automatically import other data - had no issues on day 2 either [10 games, also all shortened]

JeneralPain commented 11 months ago

I'm wondering if we can implement some sort of cache/garbage clean out system into the scoreboard?

also, it was an i5 with 16G of Ram on a 512gb SSD.

On Sat, Oct 14, 2023 at 2:47 PM bullseye555 @.***> wrote:

@JeneralPain https://github.com/JeneralPain and I have experienced this at a recent tournament we ran (Jen was Tech and I was THNSO) and there is more reports of issues coming through on the FB group. Unfortunately, the issues at our tournament were at points in time where neither of us were able to get full screengrabs etc

For the sake of this report, I'll consolidate all reports here with as much details as has been given from the FB Group [these are just the two that have been posted today, 14/10/2023 AEDT] Devon (Report 1):

We are currently running a large tournament with multiple scoreboard computers and while they have decent specs we are hearing from the operators that after a few bouts they start to become glitchy or unresponsive. It would be preferable that we didn't have to restart CRG Scoreboard or clear out data every couple of bouts to keep the program working smoothly. So my question is are there any guidelines for clearing cache or possibly increasing the Java heap size to allow for the program to use more memory? My guess is this may be some sort of memory leak, but since these large tournaments don't happen that often we usually don't see the issue when running normal home events that are 2 bouts max. Thanks in advance for any ideas or suggestions on what to try!

Comment from Spike Kwann:

more information about the versions of crg, java, os as well as other connections to your crg (ePLT, announcers , THOs, etc) would be helpful to troubleshoot as well as the types of glitches occurring. other tournaments suspected issues with the number of games set up in crg as well but that hasn’t been verified as an issue, so number of games per setup would also be interesting to know wi-fi / network infrastructure info would be helpful if there are multiple device

Update from Devon:

Ok so here is the additional information that was requested: CRG Version: 2023.4 Java Version: Java 8 Update 381 OS: Windows 10 Home 22H2 Other connections: Livestream computer connects to scoreboard computer for the overlay and that is it. Network: For the tournament the scoreboard computers are connected wirelessly to a private network and the live stream machine is hard wired into said network. I have seen this behavior though when everything has been hardwired into the same network. Some additional info: 8GB RAM Core i7-6500U at 2.50 GHz Number of bouts programmed into CRG: 15 Number of bouts actually ran: 6 Number of teams programmed into CRG: 25 Number of operators: 3

Gemma (Report 2):

At a tournament at the weekend, we reached Game 5 Period 1 Jam 11 when the jam started normally but then went back to Jam 10. This wiped Jam 11 from the ePLT and at the end of the jam the scoreboard went to saying P1 was upcoming. We tried to manually correct Jam 11, but then Jam 12 did exactly the same thing. At this point we completely restarted the Scoreboard and abandoned ePLT. I have been at another tournament where this happened, and our current theory is that there is a cap on how big the games files can be. It appears this was the point where there was no more space, so the scoreboard just failed. All previous games were removed and Game 6 appeared to run fine (although ePLT wasn't used for it). This was using the latest 2023 release and on a Windows laptop. However, as this is definitely the second tournament I've seen the issue at, plus another one that I suspect it was a contributing factor and the recent post by Devon Chopp, I don't think hardware is the problem here.

Comment from Laura Carr:

we actually had a similar situation at our last home game. We didn't recreate it at the time or pin down what we think might have contributed to it. But it sounds like what you describe here i should add, we also were using ePLT. Is it only happening when ePLT is being used?

Bullseye & Jeneral Pain

East Coast Clash

Running a central server (Dell Poweredge R320, Xeon E5-1410 (2.8GHz Quad-Core), 24GB DDR3, 2x 500GB) with 2x Scoreboard Instanced [running from different folders, one on port 8001 for Track 1, and one on port 8002 for Track 2] running a headless LINUX @.*** https://github.com/JeneralPain - can you please confirm distro & version etc] with ePLT on both tracks for all games, and a laptop as wireless remote-head for the scoreboard operators. Streaming setup was connected via ethernet to the same server with streaming run from a different device. This event ran 2023.3 as 2023.4 hadn't yet been released. We saw the following behaviour as the weekend progressed [there were 21 games in total, 10 on T2, and 11 on T1 - all issues were seen on both tracks, but not consistently].

  1. The Pick a pre-loaded game dropdown did not populate with any games, despite there being several games still to be played that were loaded. Clearing local device cache and trying to load the Operator Panel and start the game from different devices was not consistently successful. In all instances of this happening except for 1, the issue self resolved after a minute or two. The other required the Java instance on the server to be closed & restarted
  2. At the end of several games, the game did not change to Unofficial Final straight after the end of the Jam if the Period Clock expired during the last Jam [I attempted to re-create this, but was unable to - this tournament is not the first time I've seen it happen]
  3. Related to point 2, in almost all cases, the SBO panel appeared to have no impact on any of the clocks/displays. Attempting to manually stop the Period clock did nothing [as the clock was at 0, I'm not surprised by this] but also attempting to use the "(un)Official Final" button to force it to end the game also yielded no result

Trop Cup

As an additional note, there was a Tournament Jen ran tech for 2 weeks ago on a headless desktop PC [different server, but otherwise same config as the above event] @.** https://github.com/JeneralPain - can you please confirm specs of the PC used for Trop?) where when we went to use it an another tournament the following weekend, point 1 above presented the same issue on both 2023.3 and 2023.4 [with game data from the previous weekend that was run on 2023.3 imported automatically when 2023.4 was first started] We ended up using a Linux-run laptop for that tournament and it seemed to run fine for the first day (8 games, ePLT, but they were all shorter games, so I suspect smaller files due to less lineups etc). To avoid any issues on day two, I started a fresh version of 2023.4 in a new folder and first-run used the No Import* option so it didn't automatically import other data

  • had no issues on day 2 either [10 games, also all shortened]

— Reply to this email directly, view it on GitHub https://github.com/rollerderby/scoreboard/issues/671, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI4RNRAXSJ43EF2KAE6YKADX7IDM3AVCNFSM6AAAAAA576XVPOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE2DEOJTGE2TCNQ . You are receiving this because you were mentioned.Message ID: @.***>

frank-weinberg commented 11 months ago

There is no limit for game data size programmed into CRG. If we reach Java limits or physical memory limits, we should see exceptions. These will display on the backend console and be recorded in crg.log. Can you please get those logs, especially for Report 2? (The logs are not cleared on restart, so as long as the instances in question have not been deleted from the computers, the relevant data should still be in the log)

On the individual reports:

bullseye555 commented 11 months ago

Thanks for the prompt response, @frank-weinberg I've requested the logs from Report 1 & 2 and will provide once I've got them - @JeneralPain can you please pull the log files from the ECC & Trop servers?

For ECC 1 & Trop - would the easiest resolution would be to download & delete completed games, reducing the amount of data for to process on loading? Or is the issue with the currently loaded game and thus there's not really anything to be done but wait...?

ECC 2 & 3: Testing my memory now, but IIRC the scenario was:

I'll coordinate with @JeneralPain to get the server online and get the JSONs from one of the games (I know it happened a couple of times, and I remember one of the SBOs that it happened to, and they only did a couple of games, so shouldn't be too hard to get the right JSON). Logs will come too, if they're still available

frank-weinberg commented 11 months ago

For ECC1 it's the game that the operator screen is pointing to (currently loaded one unless there is a "go to current game" banner) - removing the "game=" part from the URL might be a workaround, if you just want to start a new game (untested)

The red heading indicates that the clock was still in state "running" despite having run down to 0. That explains the intermission clock not starting automatically as the automatism checks that state. Now the big question is why the state did not change after the clock had run down. And why the reactions were delayed - that might be another symptom of the root cause. Or is there any chance that the SBO did a reload and thus triggered ECC1?

bullseye555 commented 11 months ago

For ECC1 it's the game that the operator screen is pointing to (currently loaded one unless there is a "go to current game" banner) - removing the "game=" part from the URL might be a workaround, if you just want to start a new game (untested) Thanks - will give that a go next time it's encountered [will check it out when the server is back online]

And why the reactions were delayed - that might be another symptom of the root cause. Or is there any chance that the SBO did a reload and thus triggered ECC1? I was doing that remotely from my computer (so not the SBO device), so it's entirely possible that background data was still loading on my local instance after I loaded the SBO screen. The SBO was inexperienced enough that they didn't known what to do to resolve the issue, and I was trying to resolve from Tournament HQ upstairs than having to run down and do it on the SBO device

Now the big question is why the state did not change after the clock had run down. Yeah - no idea on this. As I mentioned, it's not the first time I've seen it. I think the first time was on 2023.1, and have just never been able to replicate it

frank-weinberg commented 11 months ago

Ok, so the delay is ECC1. That leaves just the broken state change for this item. My current guess is it's somehow related to the internal rounding that happens as a result of clock sync.

bullseye555 commented 11 months ago

Report 1 logs crg-scoreboard2.log crg-scoreboard1.log

frank-weinberg commented 9 months ago

2023.5 should resolve ECC 1 & 3 + TropCup. Report 1 (Devon) might be addressed as a side effect of that fix. (But could also have a different source.)

For Report 2 (Gemma) there is a reasonable chance, that it is addressed by the double key press fix. (Though the fact that it happened two jams in a row sows some doubt in my mind.)

ECC 2 is most likely still open.

bullseye555 commented 9 months ago

Thanks for the work on this, @frank-weinberg