michael-buschbeck / mychs-macro-magic

A simple, sane, and friendly little scripting language for your Roll20 macros.
MIT License
1 stars 1 forks source link

AUTORUN parsing produces major delay compared to normal parsing, hitting Roll20 "possible infinite loop" timeout #218

Open phylll opened 7 months ago

phylll commented 7 months ago

After some more investigation, it looks to me as if the problem I've had goes back to something that is different between parsing the same code as part of an AUTORUN macro vs. through the chat.

This new code is the offending example. Without it, everything parses and runs fine. With it as part of the AUTORUN macro, startup takes about twice as long and Roll20 ends up killing the sandbox for hitting its "possible infinite loop" timeout. Here are the two logs in comparison in terms of the execution time:

"MMM [1.36.1/2024-02-07T19:35:15.329Z] (+359ms) started autorun macro: !MMM-AUTORUN (owner: phyllo, privileged: true)" "MMM [1.36.1/2024-02-07T19:35:15.329Z] (+3243ms) completed autorun macro: !MMM-AUTORUN (owner: phyllo, privileged: true)"

vs

"MMM [1.36.1/2024-02-07T19:35:35.201Z] (+146ms) started autorun macro: !MMM-AUTORUN (owner: phyllo, privileged: true)" "MMM [1.36.1/2024-02-07T19:35:35.201Z] (+1558ms) completed autorun macro: !MMM-AUTORUN (owner: phyllo, privileged: true)"

As soon as I disable the AUTORUN macro and just execute the same code (which means, in terms of the two new functions in the diff above, that they should only get parsed and publish to game gets executed for them), everything runs fine!

N.B. On the suspicion that the parser or publish to game might do some sort of mock execution of these functions with undefined or default arguments, I added a quick validation check on the arguments and cut the functions short before entering any of the for loops or reading any of the expected lists and structs. No change: works when called directly in game, crashes the sandbox if parsed through AUTORUN.

michael-buschbeck commented 7 months ago

This is odd, especially the observation that it's taking longer when run as !MMM-AUTORUN than when the same thing is run interactively.

What's even more curious is that the Sandbox crashes after the autorun macro has completed running. I'm getting these logs in the test game:

"MMM [1.36.1/2024-02-13T19:32:31.550Z] starting up"
"MMM [1.36.1/2024-02-13T19:32:31.550Z] (+343ms) started autorun macro: !MMM-AUTORUN (owner: Michael \"GM\" B., privileged: true)"
"MMM [1.36.1/2024-02-13T19:32:31.550Z] (+3304ms) completed autorun macro: !MMM-AUTORUN (owner: Michael \"GM\" B., privileged: true)"

Notice how it says it has completed running the macro. Then, a few seconds later:

Possible infinite loop detected, shutting down.

I've intermittently observed that timings can be vastly different between the main game and the test game, but that apparently doesn't seem to affect the outcome. Your observation that libBasics 1.3.1 reliably works and 1.3.2 reliably produces the timeout (after a delay) appears to be spot on.

michael-buschbeck commented 7 months ago

I've trial'd-and-error'd my way through various potential culprits, and I think I found what's triggering the issue.

It's the }} in this code:

!rem // chatButtonTable(label, rows)
!rem //
!rem //   Returns chat code for a button with the label and payload provided. If the optional style is given, it is applied.
!rem //
!mmm function chatButtonTable(label, rows)
!mmm   set chatButtonTableCode = "&" & "{template:default}" & "{{name=" & label & "}} "
!mmm   for row in rows
!mmm     if row.type eq "subheading"
!mmm       set chatButtonTableCode = chatButtonTableCode & "{{ " & row.label & " }} "
!mmm     else if row.type eq "multi-buttons"
!mmm       set chatButtonTableCode = chatButtonTableCode & "{{ "
!mmm       for singleButtonProps in row.buttonProps
!mmm         set chatButtonTableCode = chatButtonTableCode & "   " & chatButton(singleButtonProps.label, chatButtonPayload(singleButtonProps.scriptCommand, singleButtonProps.settings))
!mmm       end for
!mmm       set chatButtonTableCode = chatButtonTableCode & "}} "
!mmm     end if
!mmm   end for
!mmm   return chatButtonTableCode
!mmm end function

I'm guessing that seeing a }} in there makes Roll20's stone-age chat parser believe it has found the end of the {{ that started the whole autorun macro a few macro-insertion levels up, and thus everything in initGameGlobals that follows the insertion of libBasics (which is a lot!) gets posted to the Roll20 chat engine in the egregiously imperformant line-by-line manner leading to the timeouts we started using {{...}} around the whole script for in the first place.

Replacing each }} with }\} in the script source makes the issue reliably go away. (It's set up in the main game like this right now.)

phylll commented 7 months ago

Thank you!

Strangely enough, I open the TEST game and nothing works, even with the fixed code in libBasics::chatButtonTable(). I go looking for further instances of }}, and lo and behold, there were a few in libMidgardRuneBlades, in a line that has been part of the code since Feb 2022 and has been AUTORUN ever since (8fc0165e43ba7ff25d1fe92bd1f485b5594a42cf). Perhaps because it was called rather toward the end of AUTORUN, it may not have been an issue before...? Fixing this line appears to have fixed TEST, and I'll copy the code over into the main game. Still, can't shake the feeling that this may not have been the end of this problem...

michael-buschbeck commented 7 months ago

I'm sure you're right: The further down in the final macro expansion the stray }} appear, the more of it gets sent to MMM as one large chunk of text and the less of it is parsed into individual chat lines by Roll20 and spoon-fed one by one to all API scripts.

The actual limit that triggers the "possible infinite loop" timeout is fuzzy, as is the exact mechanism that causes so much time to be consumed in the first place.

One thing we know for sure: It's literally the number of chat lines/chunks, and there's provably no relation to the total amount of data sent that way – because evidently the same huge MMM script has no trouble going through when it's sent as one single chat chunk, but breaks the sandbox when it's first split into individual chat lines by Roll20.

I've suspected the JavaScript GC in the past (because the timeout is happening after API scripts are done executing), but maybe that's too esoteric.

Maybe Roll20 is sending every chat chunk into a database queue, which then gets processed just after the end of each Sandbox callback fast enough if only a single chat chunk gets inserted in a single transaction regardless of size but scales terribly when thousands of chat lines get inserted in thousands of individual transactions. (Why API chat lines would be sent to a database is anyone's guess, but I wouldn't put it past Roll20's devs.)

Or maybe the key difference isn't even in Roll20 itself but in one of the many other API scripts we have installed, especially in our main game: Each chat chunk is passed by every registered API script – perhaps one of the other API scripts unwittingly does something that produces terrible amounts of GC garbage (...so we'd be back at this core hypothesis) each time it sees a chat chunk pass by, regardless of size.