lep / jassdoc

Document the WarCraft 3 API
52 stars 20 forks source link

Note each of Blizzard.j Leaks #108

Closed Luashine closed 1 year ago

Luashine commented 1 year ago

I went ham. :cut_of_meat: (100% natural meat)


This shell script will scan Blizzard.j for local variable declarations. Nothing fancy, could be used on your own war3map.j with little success. I specifically excluded player and race because these handles are never removed anyway, so it doesn't matter.

clear; grep --line-number -F 'local ' Blizzard.j | grep -vE '(integer|boolean|real|string|player|race|@bug)' | grep -vF '// Use only local code'

Approach:

  1. Find local variable that have handle as underlying type. Example local location heroLoc = ...
  2. See if this variable is set to null at the end of function. Good: set heroLoc = null
  3. If it is not cleared, add @bug note -> handle leak
  4. If the object is not returned nor set to global AND not destroyed -> object leak (like RemoveLocation)

If there were too many handle leaks, I put them under one note. Ex: MeleeStartingUnitsHuman. 1-2 occurrences per function got each their own note.

Often I have added @note to warn about new objects being returned or (unexpectedly) removed within the function.

As part of this analysis I also wrote down where objects were not removed at all (mostly in starting units functions for each race).

I did not touch arrays but there were few of those.

common.j changes should've got each their own separate commit. Of particular interest was this:

    // Remove or kill the original unit.  It is sometimes unsafe to remove
    // hidden units, so kill the original unit if it was previously hidden.
    if wasHidden then
        call KillUnit(oldUnit)
        call RemoveUnit(oldUnit)
    else
        call RemoveUnit(oldUnit)
    endif

Motivation

It seems everybody in the modding community has got the memo that you must null local variables in Jass. However this everybody only includes those hand-coding in Jass and excludes Blizzard's employees.

The MineralZ map updates multiboards through GUI a lot and it turned out, the BJ functions acquire an mbitem but never set its local var to null. After avoiding these calls (courtesy of maddeem) the map seemed to behave way better after 1h+ of playing. I've spend days working my ways around this problem but missed the possibility of BJ functions being the culprit.

Explanations

This deep dive by Unryze has got to be the most competent explanation, but honestly hard to grasp:

https://xgm.guru/p/wc3/Jass-MythBusters in Russian Local handles always leak! It was previously believed that any complex local variable (everything that is a handle, that is, units, abilities, and so on) leak and waste 4 bytes of memory, regardless of nulling, or not. However - this turned out to be only half the truth, for there is a leak, but only if we create an object and assign it to a local variable and at the end of the code we do not reset this variable. That is, we can transfer the reference to this local variable to the global, which will allow us to null the local variable and actually remove the leak related to the "complex" local variables. Example: This will cause a leak.

    function DoSomething takes nothing returns nothing
        local unit u = CreateUnit( Player( 0 ), 'hpea', 0., 0., 270. )
        ....
    endfunction

However, if you call Removeunit (U), then this will not be a leak, since unit was processed and removed. This will not cause a leak

    function DoSomething takes nothing returns nothing
        local unit u = CreateUnit( Player( 0 ), 'hpea', 0., 0., 270. )
        ....
        set u = null
    endfunction

This will NOT LEAK

    globals
        unit uTemp = null
    endglobals

    function main takes nothing returns nothing
        set uTemp = CreateUnit( Player( 0 ), 'hpea', 0., 0., 270. )
    endfunction

    function DoSomething takes nothing returns nothing
        local unit u = uTemp
        ....
    endfunction

To understand why, we should look at Jass bytecode, take a look at the variants without and with a leak. ...

I would like to make an amendment/addition (for which I thank PTR153), if you create an object and assign it to a local, or create any object inside the function and assign it to a local variable, then it MUST BE nulled; if the created object was not removed and you continue to use it, then all that is needed is to assign it to some global variable and return it.

Short example:

function TestFunctionEx takes nothing returns nothing
    local location loc = Location( .0, .0 )
    set Loc = loc
    set loc = null
    call RemoveLocation( Loc )
endfunction

So the trick is that we can move an object from a local variable to a global and then, without a leak, conduct an operation with it. However, there are exceptions - a group, for some reason, a group always and stably causes a leak of 1 byte, even with a similar principle.


Here DSG says, 2018:

Be aware many GUI actions and functions have implicit leaks due to the local declared local handle (agent?) variable reference counter leak on return bug that has not yet been fixed.

Gerries, 2012 correctly points out that MB functions in BJ leak, because local handle and not nulled:

And, multiboarditem extends handle so they DO LEAK if they are not destroyed, not even nulled. Just test it, I made a test map for you with leaking the mbi, and tracking the handle's count( of course the two different library does not call each other and I am not cheating this way :p )

lang_dye, 2016 how to use locals that are returned (assign to global).

jonadrian619, 2007 locals tutorial (bad advice to null strings though)

Proper and the go-to tutorial by IcemanBo, 2015. In his terminology that's a reference counter leak. "To achieve this we always let our variables point on something else, after they fullfilled their purpose and before they fall out of scope. null comes handy here, as you may apply it to any agent type."

Codemonkey11's response and advice, 2013

Review

I would like for someone with Jass knowledge to read and understand Unryze's explanation (in addition to others) to review the notes I added.

You can jump between the added notes with text search In Jass you must set local variables - this text should repeat in every bug note, because I used a macro to paste it.

Further work

Before making a fuss about it on Blizzard forums (hell, who else would be able to fix Blizzard.j after so many years?) I'd like to have basic reproduction maps: With nulling vs Without nulling such basic types as:

This should give pretty good coverage. Why not Locations? I am not sure exactly what part of the code slows down due to such leaks. In case of units, special effects, multiboard it is trivial to trigger the laggy code path -- basically rendering a game frame, having units traverse a map -> render and simulation.

Based on experience, performance issues start after about 30-40min where unintentional leaks were the root cause.

Luashine commented 1 year ago
common.ai:1314:    local unit    peon
common.ai:2039:    local unit target = GetMinorCreep()
common.ai:2047:    local unit target = GetMajorCreep()
common.ai:2055:    local unit target = GetCreepCamp(min_creeps,max_creeps,allow_air_creeps)
common.ai:2063:    local unit hall
common.ai:2082:    local unit creep = GetExpansionFoe()
common.ai:2142:    local unit      hall
common.ai:2143:    local unit      mega
common.ai:2144:    local unit      creep
common.ai:2145:    local unit      common
common.ai:2447:    local unit bldg

Added the above about common.ai

Just for fun checked the full /scripts/ folder, pld and .ai. These are all that were found. Nothing important.

reforged_u05_navy.ai:520:    local unit target = null
reforged_u08_blue.ai:449:    local unit target = null
reforged_u08_red.ai:464:    local unit target = null
reforged_u08_teal.ai:359:    local unit target = null
u05_mint.ai:335:    local unit target = null
lep commented 1 year ago

There is also https://web.archive.org/web/20180130134151mp_/http://www.wc3c.net/showthread.php?t=81872 by PipeDream. It's quite old though but i wouldn't expect much (if anything) has changed.

While i have never dived into jass vm internals i can give my understanding of the matter. Any type in jass that inherits from agent† is reference counted†† and therefore should be garbage collected automatically. But there is a bug where values held in local variables don't get their reference count reduced upon leaving the functions scope. That's why we set those to null.

†: agent was a late addition so it's not actually that clear cut.

††: There are of course exceptions: player inherits from agent but they're basically constants. texttags don't inherit from agent and are indeed handled differently, so are images. boolexprs are so so (using Filter is basically a constant, but using And is not).

Luashine commented 1 year ago

Any type in jass that inherits from agent† is reference counted††

So the mistake that I made is to call them all handles in my comments. Should this be corrected? Commit for the other 2 comments incoming now.

Luashine commented 1 year ago

@lep what's the new line policy for this repo? I just booted into Linux from Windows and git's telling me that all files were changed due to ^M (carriage return?) at the end of lines.

I will find the appropriate git settings if we decide upon a standard. Regular Jass files and other like WTS seem to come with \r\n by default but I don't remember plain \n being a problem.

EDIT Same conflict with your other files like mksrc, .hs, .nix etc.

lep commented 1 year ago

Any type in jass that inherits from agent† is reference counted††

So the mistake that I made is to call them all handles in my comments. Should this be corrected? Commit for the other 2 comments incoming now.

Yeah i guess, but with all the exceptions and stuff i would be fine with either. It's allready a lovely PR.

@lep what's the new line policy for this repo? I just booted into Linux from Windows and git's telling me that all files were changed due to ^M (carriage return?) at the end of lines.

I will find the appropriate git settings if we decide upon a standard. Regular Jass files and other like WTS seem to come with \r\n by default but I don't remember plain \n being a problem.

EDIT Same conflict with your other files like mksrc, .hs, .nix etc.

Newlines are a bane. My parser should handle either and so should wc3. The bigger problem would uneccessary huge diffs for changed newlines. I think the git setting is the best solution.

Luashine commented 1 year ago

The bigger problem would uneccessary huge diffs for changed newlines.

Oh my god, they already became.

I checked and the original Blizzard files are 100% CRLF. Btw, old WC3 had "Blizzard.j" and "Cheats.j" capitalized, Reforged has small "b" and "c".

We can't guarantee that all changes to tracked files will adhere to CRLF autodetection like IDEs and editors do (like scripts are most likely to violate). INPUT_SIDE aka when checking into git's repo.

We already have this state where the internally saved files have mixed new lines (333 of LF). git ls-files --eol:

i/mixed w/mixed attr/                   Blizzard.j
i/lf    w/lf    attr/                   common.ai
i/lf    w/lf    attr/                   common.j

Explanation: git index has mixed, working dir has mixed too. The magic command to remove all local files and replace with git's as-is (warning, deletes your files):

  1. git config --local core.autocrlf false (apply locally to test this)
  2. git rm --cached -r .
  3. git reset --hard

And if this doesn't work like... try again. It's weird.

If we force a certain line ending through .gitattributes then we will need at least one commit right now and if people had outstanding changes, that's gonna be painful. I think afterwards it'll be fine again and I think cause no further problems.

Alternatively keep current core.autocrlf and accumulate more mixed stuff, weird differences like when I shared the same folder between Linux and Windows on one machine and at commit time it wanted to overwrite half the lines :(

https://stackoverflow.com/questions/3206843/how-line-ending-conversions-work-with-git-core-autocrlf-between-different-operat https://stackoverflow.com/questions/21822650/disable-git-eol-conversions https://stackoverflow.com/questions/35801024/how-to-see-what-type-of-line-endings-are-already-in-git-repository https://stackoverflow.com/questions/3206843/how-line-ending-conversions-work-with-git-core-autocrlf-between-different-operat https://stackoverflow.com/questions/7156694/git-how-to-renormalize-line-endings-in-all-files-in-all-revisions

I'd need to read those again too. I think the behaviour I'd like at this moment (to avoid huge commits) is to force CRLF for the Warcraft files inside the repo index. Because eventually the entire Jass file will be replaced with CRLF-ed game files. Leaves the question open what to have locally, probably force CRLF too to avoid issues. So like when everyone's recommending to use LF everywhere, we're gonna do the opposite due to syncing with game's CRLF files? See you later.

EDIT: Oh my, though the common.j and common.ai are LF on the inside...

lep commented 1 year ago

I did this to see if for example mksrc could handle CRLF on my unix dev machine (i was worried that it might duplicate newlines in the web interface) and it seems to work. I haven't tested it yet on my build machine though. This would, as i understand, correspond to your current prefered solution?

Luashine commented 1 year ago

In my commit I will replace from/to:

In Jass you must set local variables that hold handles (or any child type) to null at the end of functions to avoid leaks.

In Jass you must set local variables that hold agents (or any child type) to null at the end of functions to avoid reference counter leaks.

lep commented 1 year ago

Except from that one bug this seems mostly done? I have added you to this repositories contributors which i guess you could commit and merge stuff yourself. That doesn't mean you have to actually use that power though; there are some upsides of me looking at every PR, even apart from the four-eyes principle, like i can directly deploy the new DB and i have the build infrastructure already set-up.

Luashine commented 1 year ago

I have added you to this repositories contributors ... That doesn't mean you have to actually use that power though;

Thank you, I have read the sudo message many times :heart: I appreciate it