wrye-bash / CBash

Home of CBash.DLL sources
GNU General Public License v2.0
9 stars 4 forks source link

64bit Port #23

Closed Infernio closed 4 years ago

Infernio commented 4 years ago

CBash is one of two reasons WB is stuck on 32bit (the other is https://github.com/wrye-bash/wrye-bash/issues/431), denying the entire program (but especially PBash) a ~30% speedup. We are pretty close - most methods seem to work - but a lovely traceback stands in our way:

Traceback (most recent call last):
  File "bash\balt.py", line 2500, in __Execute
    self.Execute()
  File "bash\balt.py", line 1604, in _conversation_wrapper
    return func(*args, **kwargs)
  File "bash\basher\mod_links.py", line 932, in Execute
    if not self._Execute(): return # prevent settings save
  File "bash\basher\mod_links.py", line 976, in _Execute
    doCBash=self.doCBash)
  File "bash\bosh\__init__.py", line 2025, in rescanMergeable
    return self._rescanMergeable(names, prog, doCBash, return_results)
  File "bash\bosh\__init__.py", line 2054, in _rescanMergeable
    canMerge = is_mergeable(fileInfo, self, reasons)
  File "bash\bosh\_mergeability.py", line 224, in isCBashMergeable
    if not _modIsMergeableLoad(modInfo, minfos, reasons) and not verbose:
  File "bash\bosh\_mergeability.py", line 212, in _modIsMergeableLoad
    reasons.append(_(u'Is a master of non-mergeable mod(s): %s.') % u', '.join(sorted(dependent)))
  File "bash\cint.py", line 15301, in __exit__
    self.Close()
  File "bash\cint.py", line 15317, in Close
    _CDeleteCollection(self._CollectionID)
WindowsError: exception: access violation reading 0xFFFFFFFFFFFFFFFF

Firing up Visual Studio and breakpointing cb_DeleteCollection shows that the access violation actually occurs in the Collection destructor: https://github.com/wrye-bash/CBash/blob/e7b99dc6339345acc782730bc658d2291472d5c6/src/Collection.cpp#L163-L171 Specifically, delete ModFiles[p]; crashes.

Breakpointing inside the destructor shows this:

destructor

The first two ModFile instances (Oblivion.esm and Unofficial Oblivion Patch.esp, if you're curious) delete correctly, but the third one (Oblivion Citadel Door Fix.esp.ghost) crashes. As you can see from the screenshot, there's definitely something off about the third ModFile - it seems to be missing everything from TES4File.

Infernio commented 4 years ago

Turns out that this is probably all Python-side. From cint.py:

    def load(self):
        def _callback(current, last, modName):
            return True

        cb = CFUNCTYPE(c_bool, c_uint32, c_uint32, c_char_p)(_callback)
        _CLoadCollection(self._CollectionID, cb)

        _NumModsIDs = _CGetLoadOrderNumMods(self._CollectionID)
        if _NumModsIDs > 0:
            cModIDs = (c_ulong * _NumModsIDs)()
            _CGetLoadOrderModIDs(self._CollectionID, byref(cModIDs))
            self.LoadOrderMods = [self._ModType(ModID) for ModID in cModIDs]

        _NumModsIDs = _CGetAllNumMods(self._CollectionID)
        if _NumModsIDs > 0:
            cModIDs = (c_ulong * _NumModsIDs)()
            _CGetAllModIDs(self._CollectionID, byref(cModIDs))
            self.AllMods = [self._ModType(ModID) for ModID in cModIDs]

Note those (c_ulong * _NumModsIDs)() calls, they make new arrays with _NumModsIDs() elements. What are the array elements? c_ulongs, of course. Those then get filled with pointers to the mod file objects inside CBash. This all works fine on x86, where the size of a pointer == the size of a c_ulong. But on x64 Windows:

assert(sizeof(unsigned long) == 4) // passes

Which means we were stuffing 64bit pointers into arrays of 32bit integers. That can't end well, and was leading to the crash. Replacing the calls above with (c_uint64 * _NumModsIDs()) (or even better, defining an alias c_pointer = c_uint64 and using that) makes the patch dialog appear again. There are definitely still other parts of cint that have this same problem - the patch crashes shortly after clicking Build Patch.

Utumno commented 4 years ago

Thanks for looking into this! Would be great to get rid of the 32 compatibility for 307 - a pity that there are 959 matches for 32 in cint :P

Utumno commented 4 years ago

To begin with - anyway cint has FNV classes - would make sense to rip them into another module and concentrate on the Oblivion side? Would go in line with our load-game-specific stuff line in wrye-bash/wrye-bash#358 - see issue #2 here

Infernio commented 4 years ago

Yeah, I'll get stated on a branch for https://github.com/wrye-bash/wrye-bash/issues/264, seems like the only way to keep my sanity while fixing this 😛 Should I keep the package named cint or take that opportunity to rename it to something better?

Infernio commented 4 years ago

Also, is it important to keep the whole #See if cint is being used by Wrye Bash code that allows cint to work even outside WB?

Utumno commented 4 years ago

Should I keep the package named cint or take that opportunity to rename it to something better?

Keep it I'd say - too many references around, let's see how the refactoring goes, plus you know, now after all these years that I have people here actually doing something, and long standing issues (like wx3 etc) are on their way we may concentrate on 308 on the patcher. Meaning we may rewrite it with cython or whatever and bin cbash altogether.

Also, is it important to keep the whole #See if cint is being used by Wrye Bash code that allows cint to work even outside WB?

bin it for god's sake :P

GandaG commented 4 years ago

Trying out the 64bit branch - the popup asking whether we're sure we want to use CBash never appears and patch dialog never comes up. Culprit is this indirect C function (why are using this instead of wxPython?) that never creates/shows the popup.

GandaG commented 4 years ago

All patchers are now running and the above issue with TaskDialogIndirect is now fixed (courtesy of lojack) in ganda-64bit and in ganda-wip. I couldn't find other problems with 64bit