xyzz / openmw-android

OpenMW for Android
https://omw.xyz.is/
GNU General Public License v3.0
325 stars 87 forks source link

Quest dialogue cannot give items with GLESv2 #48

Closed vpetzel closed 3 years ago

vpetzel commented 3 years ago

Hello, I noticed that enabling GLESv2 leads to you not getting the items in the last tutorial stage (so you do not get the message for Caius Cosades, as well as the Money and whatever you get. Thus you cannot even leave the tutorial area, because you cannot get the required items. While I haven’t tested so far, but I think GLESv2 leads (for some obscure reason) to quest dialogue not giving items. I’m using a clean, unmodded german copy of the game, but of course it would be interesting if you guys can reproduce the issue:

Steps to reproduce: → For versions prior to 0.46.0-27 activate GLESv2 → Start a new game and go through the tutorial → When you talk to Sellus Gravius about your tasks, you should get four notes about updating the diary, giving you instructions, giving you messages for Caius Cosades and giving you 86 draken. With GLESv2 I only get the notice about the update of the diary. → You end up with the instructions, but without the messages or the gold

Thus I assume that the notification of you getting the item somehow does not work and crashes the script when GLESv2 is activated.

Sisah2 commented 3 years ago

It also seems to happening only with some localizations (german, russian, spanish) and only on some devices. Nice bug :D There is one post on openmw.org, he got the same error with some strange messages in openmw.log.

xyzz commented 3 years ago

Can you post a link to the thread and any error messages they got in the log?

Sisah2 commented 3 years ago

https://forum.openmw.org/viewtopic.php?f=47&t=6952

Got a log from russian user with this message Dialogue error: An exception has been thrown: basic_string

I cant reproduce myself.

akortunov commented 3 years ago

Judging by this message, C++ raises an exception, when OpenMW tries to execute compiled script, attached to the dialogue entry (the DialogueManager::executeScript function).

Script execution process is a quite complex, so it is hard to locate an error. What I'd suggest to do:

  1. Find a user, which can reproduce the issue.

  2. Build an APK with patched OpenMW with additional log messages, e.g. something like this:

            try
            {
                Log(Debug::Info) << "Begin dialogue sctipt execution";
                Log(Debug::Info) << "Script body:\n";
                Log(Debug::Info) << script;
                MWScript::InterpreterContext interpreterContext(&actor.getRefData().getLocals(), actor);
                Log(Debug::Info) << "Script context created";
                Interpreter::Interpreter interpreter;
                Log(Debug::Info) << "Interpreter created";
                MWScript::installOpcodes (interpreter);
                Log(Debug::Info) << "Opcodes installed";
                interpreter.run (&code[0], code.size(), interpreterContext);
                Log(Debug::Info) << "Script execution finished";
            }
            catch (const std::exception& error)
            {
               Log(Debug::Error) << std::string ("Dialogue error: An exception has been thrown: ") + error.what();
            }

    And if the issue is really only with AddItem, probably it is worth to do a similar thing with the MWScript::Container::OpAddItem.

  3. Provide the APK to mentioned user and gather his logs.

Also I have a suspicion that only localized setups are affected by this bug because some scripts in such setups contain non-ASCII characters.

akortunov commented 3 years ago

After investigating #51, I have a suspicion that GLESv2 messes up string formatting on affected devices for some reason. So probably it is worth to print additional messages in the stringops.h instead.
Related formatting function:

    // Requires some C++11 features:
    // 1. std::string needs to be contiguous
    // 2. std::snprintf with zero size (second argument) returns an output string size
    // 3. variadic templates support
    template <typename ... Args>
    static std::string format(const char* fmt, Args const & ... args)
    {
        auto size = std::snprintf(nullptr, 0, fmt, argument(args) ...);
        // Note: sprintf also writes a trailing null character. We should remove it.
        std::string ret(size+1, '\0');
        std::sprintf(&ret[0], fmt, argument(args) ...);
        ret.erase(size);

        return ret;
    }
vpetzel commented 3 years ago

I cannot say for sure, but based on the functionality string format should not be tied to GL. At least it would be senseless if it was so. Thus I guess the culprit is somewhere in MWBase::Environment::get().getWindowManager()->messageBox(msgBox, MWGui::ShowInDialogueMode_Only); and I guess somehow the GLESv2 implementation does not handle non ASCII chars correctly.

akortunov commented 3 years ago

Thus I guess the culprit is somewhere in

Unlikely. In this case subtitles would not work at all on affected devices. I'd suggest to check string formatting code first to check if it forms input strings properly with non-english localizations on such devices.

By the way, message box spawning code is not related to OpenGL as well - it does not render anything.

vpetzel commented 3 years ago

I just think that GL might be used for font rendering, so this might be the culprit. I’d try to do some debug logging, but I’ve got no idea how to build this thing for android.

xyzz commented 3 years ago

hi @vpetzel could you please provide the following additional information which reproduces the issue:

melhiot commented 3 years ago

a small addition. this problems appears in 0.46.0-25 and above.

  • a zipped copy of your /storage/emulated/0/Morrowind clean(vanilla) Morrowind GOTY RU
  • a list of settings configured within OMW which are different from defaults even with default settings
akortunov commented 3 years ago

Here are differences in OpenMW and gl4es in the 25-th nightly. I did not notice anything that could break text formatting.

xyzz commented 3 years ago

So basically it looks like snprintf is broken on android. The issue is that at some point the first call to snprintf as used in stringops.h format starts returning -1. OpenMW doesn't handle this case and ends up corrupting some memory but it's w/e, in the end when it does std::string::erase it throws an exception and kills the remainder of the script.

The corresponding format call is

size = std::snprintf(nullptr, 0, "%s - предмет передан.", "Справка об освобождении");

Initially it works fine and returns 77 as expected (utf-8 encoding), however at some point it just starts returning -1. Still not sure why that is.

xyzz commented 3 years ago

In any case, the boost format patch by @akortunov fixed the issue on my test device. So if you're experiencing the same problem, try a build from https://github.com/xyzz/openmw-android/actions/runs/593296420 and let me know if that works.

akortunov commented 3 years ago

I have a suspicion that Android does not support C++11 properly - it is a requirement that strings should be contiguous, so used pattern should work (and it works on PC).

xyzz commented 3 years ago

I don't see what this has to do with strings being contiguous or not. One issue is that OpenMW doesn't handle negative return from snprintf (it is a valid return value). The fact that it returns a negative value with these arguments, and how it starts doing that randomly on android, is potentially a bug - but also a C issue, not C++.

akortunov commented 3 years ago

If snprintf does not work properly on Android, it would be better to determine the reason of such behaviour - this function is used in OpenMW and its dependencies.

In theory, snprintf should set errno variable when it returns -1 on Unix-like systems, so it should be possible to get text error message:

#include <errno.h>

int size = std::snprintf(nullptr, 0, fmt, argument(args) ...);
if (size < 0)
{
    std::ostringstream os;
    int localerr = errno;
    os << "Failed to format string '" << fmt << "'" << std::strerror(localerr);
    throw std::runtime_error(os.str());
}
xyzz commented 3 years ago

Yes it is of course much better to figure why exactly it's failing, but I've already sunk 5 hours into this bug and honestly don't care enough to figure whether it's android libc bug or what or invest any more time.

So running this code

template <typename ... Args>
static std::string format(const char* fmt, Args const & ... args)
{
    errno = 0;
    auto size = std::snprintf(nullptr, 0, fmt, argument(args) ...);
    if (size < 0) {
        std::ostringstream os;
        int localerr = errno;
        os << "Failed to format string '" << fmt << " " << localerr << "'" << std::strerror(localerr);
        throw std::runtime_error(os.str());
    }
    // Note: sprintf also writes a trailing null character. We should remove it.
    std::string ret(size+1, '\0');
    std::sprintf(&ret[0], fmt, argument(args) ...);
    ret.erase(size);

    return ret;
}

results in:

[00:13:40.661 I] Loading cell Seyda Neen, Census and Excise Office
[00:13:45.293 E] Dialogue error: An exception has been thrown: Failed to format string '%s - предмет передан. 0'Success

Oh and also from my earlier testing, the issue only happens when the format string has cyrillic characters, so I doubt it would be a problem elsewhere in the codebase.

vpetzel commented 3 years ago

C does not require snprintf setting errno, it’s just POSIX that demands this. From what I see the bionic libc does not do this. I’m just a bit confused about why this only happens on GLESv2. After all, naught of this should depend on the GL?

xyzz commented 3 years ago

So I've bisected gl4es and this is the first bad commit https://github.com/ptitSeb/gl4es/commit/adaaba34ffcec3ff453ee9c502fab33090757aed and of course it's locales, god damn I hate C.

Changing the setlocale call to C.UTF-8 fixes it of course, I'm still more comfortable with the boost::format solution due to the following:

akortunov commented 3 years ago

IIRC, Boost uses libc on low level, so it can be affected by flawed C implementation as well.

xyzz commented 3 years ago

fixed in bd0912c81e17f5b57154ab2019af5039fba33f13