psi-im / plugins

Officially supported Psi plugins
56 stars 24 forks source link

[Windows x64] Crash when OTR "Generate new key" or "Start private conversation" #45

Closed Neustradamus closed 5 years ago

Neustradamus commented 5 years ago

Since a long time, there is a problem with OTR on Windows x64.

It is impossible to:

No problem on Windows x86.

Neustradamus commented 5 years ago

It is linked to libotr/libgcrypt, it is possible to have help? @tehnick has tried to debug it without success.

@jeroen, @jkivilin, @dd9jn, @dkg, @dtzWill, @mstorsjo, @smuellerDD @arlolra, @dgoulet @claucece @DrWhax @juniorz

Thanks in advance

claucece commented 5 years ago

Mmm.. interesting. I don't use Windows.. but if I find a way to reproduce this.. I'll let you know :)

tehnick commented 5 years ago

I don't use Windows.. but if I find a way to reproduce this..

Personally I prefer to debug MS Windows related bugs in WINE first. And this bug is 100% reproducible in 64-bit version of WINE.

Some additional info:

DrWhax commented 5 years ago

I'm not familiar with Psi, is otr plugin packaged with the psi bundle(https://psi-im.org/download/)?

Are you using mingw-w64?

tehnick commented 5 years ago

is otr plugin packaged with the psi bundle

Yes.

The most recent builds of Psi+ are here. From time to time I make Psi IM builds from master branch and upload them here. Difference between Psi+ and Psi IM and our development model are documented here.

Are you using mingw-w64?

Yes. I use MXE + Sibuserv projects for cross-compilation for MS Windows. Build scripts are here.

But I do not think that there is need in debugging of full Psi+ program + OTR plugin: we may prepare more simple example for reproducing of the bug based on small piece of code from plugin.

tehnick commented 5 years ago

BTW, our builds using MSVC had similar bug as far as I remember. But they were prepared and tested by other people, so I am not sure.

DrWhax commented 5 years ago

@tehnick Thanks for this! I'll check it out tomorrow.

BTW, there is a weird ASLR bug (with a workaround its fixable!) with mingw-w64 that renders ASLR.. useless: https://insights.sei.cmu.edu/cert/2018/08/when-aslr-is-not-really-aslr---the-case-of-incorrect-assumptions-and-bad-defaults.html

But that might need a seperate ticket..

tehnick commented 5 years ago

@DrWhax Thanks for info. I have added workaround proposed in this article. This will be enough I hope.

Neustradamus commented 5 years ago

@DrWhax: Have you checked? Have you detected the problem? What is your point of view?

claucece commented 5 years ago

Hey @tehnick !

I saw that you have a workaround.. is that enough? It seems like this is mostly a libgrcrypt problem. If you try to install that library, does it succeed?

Thanks!

tehnick commented 5 years ago

I saw that you have a workaround.. is that enough?

What workaround are you talking about?

It seems like this is mostly a libgrcrypt problem.

We not sure if this a libotr or libgrcrypt problem, because libotr may use libgrcrypt in a wrong way.

If you try to install that library, does it succeed?

Sorry, I didn't understand this. Could you explain what do you mead in more details?

tehnick commented 5 years ago

I saw that you have a workaround.. is that enough?

If you are talking about https://github.com/psi-im/plugins/issues/45#issuecomment-455918370 that was completely another issue, not related to OTR. Sorry for off-topic comments here.

claucece commented 5 years ago

Hey,

What workaround are you talking about?

This one: "I have added workaround proposed in this article. This will be enough I hope."

We not sure if this a libotr or libgrcrypt problem, because libotr may use libgrcrypt in a wrong way.

Ok, let me clarify, there is still a crash when generating new keys or starting a conversation, right? In windows64, right?

If you are talking about #45 (comment) that was completely another issue, not related to OTR. Sorry for off-topic comments here.

No problem.

If you have some debug logs of when the crash happens, paste them here. It can be a problem within threads of the UI and the generation of the keys.

Thanks!

claucece commented 5 years ago

Ok, so I have looked at the code. For me, this can be a problem with threads. I see that on the main thread 'otrl_privkey_generate_start' and 'otrl_privkey_generate_finish' are called. In the back, 'otrl_privkey_generate_calculate' is called. Or at least that it how it should behave, according to the documentation in libotr. if this is related to libgcrypt, the place where it can fail is: https://github.com/psi-im/plugins/blob/master/generic/otrplugin/src/otrinternal.cpp#L851, as 'otrl_privkey_generate_calculate' is the only one that calles libgcrypt funcs. What can be tried is:

@tehnick if you can send me the log of when it crashes, it will be super useful.

Thanks!

tehnick commented 5 years ago

Ok, let me clarify, there is still a crash when generating new keys or starting a conversation, right?

When interlocutor tries to start OTR session with you and you do not have a key for current account special dialog window will be opened with suggestion to generate a key. So the exactly the same code is called in both cases.

In windows64, right?

Yes, the problem is reproducible only in 64-bit MS Windows executables. And possibly only in MinGW builds.

If you have some debug logs of when the crash happens, paste them here. It can be a problem within threads of the UI and the generation of the keys.

Unfortunately currently I have debug build of program with debug symbols only for Psi+ and its plugins, but not for libraries which it uses. So the backtrace is not enough imformative. You may look on it here.

claucece commented 5 years ago

Unfortunately currently I have debug build of program with debug symbols only for Psi+ and its plugins, but not for libraries which it uses. So the backtrace is not enough imformative. You may look on it here.

It is still very useful. I'll check if something is strange there.

Thanks!

claucece commented 5 years ago

Ok. I have reviewed the log. The important parts are:

No symbol table info available.
#6  0x000000001eee130f in OtrInternal::create_privkey (
    this=this@entry=0x1ba6ee60,
    accountname=0x1c000708 "{4d70ff68-ac36-4c21-9872-448d352d1e19}",
    protocol=protocol@entry=0x1eef68a0 "prpl-jabber")
    at /home/boradmin/Tmp/Psi+/psi-plus-snapshots/src/plugins/generic/otrplugin/
src/otrinternal.cpp:854
        qMB = <incomplete type>
        newkeyp = 0x1c0294e0
        loop = <incomplete type>
        watcher = {<QFutureWatcherBase> = {<No data fields>}, m_future = {
            d = {<QFutureInterfaceBase> = {<No data fields>}, <No data fields>}}
}
        future = {
          d = {<QFutureInterfaceBase> = {<No data fields>}, <No data fields>}}
        fingerprint = "\020}T\033\000\000\000\000Т\020wf\000\000\000\000\bnX\033
\000\000\000\000'ьvf\000\000\000\000\000\001\000\000\000\000\000\000>cШ\025"
#7  0x000000001eee24ff in OtrInternal::generateKey (this=0x1ba6ee60,
    account=...)
    at /home/boradmin/Tmp/Psi+/psi-plus-snapshots/src/plugins/generic/otrplugin/
src/otrinternal.cpp:773

It fails at loop.exec();. Unsure why; but this makes me think it is not a problem from libgcrypt or libotr.

tehnick commented 5 years ago

Sorry for delay with reply: I was busy with other tasks in the project.

It fails at loop.exec();. Unsure why; but this makes me think it is not a problem from libgcrypt or libotr.

Sorry, but your assumption is wrong. Probably you do not know how QtConcurrent works...

This is quite simple layer which allows to call a functions in a separate thread. Race conditions and other thread specific errors are excluded, because all functions in that thread are called in series according to the pool of "events" (functions and their arguments) prepared before start of that thread.

Debug note about loop.exec(); means exactly that a separate thread was created and QtConcurrent began to call functions from his event loop.

Segfault had happened in that new thread. The most important debug notes are:

[Switching to Thread 2944.0x554]
_gcry_mpih_lshift () at mpih-lshift-asm.S:46
46      mpih-lshift-asm.S: No such file or directory.

(gdb) thr a all bt full

Thread 27 (Thread 2944.0x554):
#0  _gcry_mpih_lshift () at mpih-lshift-asm.S:46
No locals.
#1  0x000000001bf6d7e0 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
tehnick commented 5 years ago

I have prepared a minimal reproducible example for you as promised: https://github.com/tehnick/example-of-libotr-crash

tehnick commented 5 years ago

BTW it is written in plain C...

tehnick commented 5 years ago

Here are statically linked 32- and 64-bit executables for MS Windows: https://tehnick.net/tmp/test-libotr-1.0_x64.exe https://tehnick.net/tmp/test-libotr-1.0.exe Unfortunately, they are still without debug symbols...

tehnick commented 5 years ago

gdb log for test-libotr-1.0.exe: https://susepaste.org/view/raw/47951425 gdb log for test-libotr-1.0_x64.exe: https://susepaste.org/view/raw/85746535

tehnick commented 5 years ago

All executables are built using MXE project.

claucece commented 5 years ago

Hey!

Sorry, but your assumption is wrong. Probably you do not know how QtConcurrent works...

Sure. I'm not a developer of this project or C++.

As the debug says:

[Switching to Thread 2944.0x554]
_gcry_mpih_lshift () at mpih-lshift-asm.S:46
46      mpih-lshift-asm.S: No such file or directory.

You are missing the mpih-lshift-asm.S file. Can you check if:

  1. It exists in the libgrcrypt library version you are using: https://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=blob;f=mpi/amd64/mpih-lshift.S;h=9e8979b1060393842c0505e338f5ef7305dc51ce;hb=refs/heads/master
  2. Is been linked correctly: something like config.status: linking mpi/amd64/mpih-lshift.S to mpi/mpih-lshift-asm.S.

As libotr does not come with libgcrypt, you should be installing it to build it. Maybe you just need to upgrade the version for libgcrypt. Or if not, submit an issue to the MINGW-packages: https://github.com/msys2/MINGW-packages/issues or libgcrypt: https://dev.gnupg.org/

That is why I said:

It seems like this is mostly a libgrcrypt problem. If you try to install that library, does it succeed?

Have you try on only builiding libcrypt (or libotr) and checking if the tests pass? Have you try building them from source?

We not sure if this a libotr or libgrcrypt problem, because libotr may use libgrcrypt in a wrong way.

If libotr can't find a file, then this looks like a libgcrypt linking problem.

claucece commented 5 years ago

And, if you manage to install libgrcrypt and build it, can you paste the build output?

tehnick commented 5 years ago

As I noted above:

All executables are built using MXE project.

This project allows to easily prepare SDK with subset of libraries you need for building of applications.

Current versions of libraries:

libgcrypt | 1.8.4
libgpg_error | 1.35
libotr | 4.1.1

All libraries are built from sources taken from upstreams. Here are build rules for them:

You are missing the mpih-lshift-asm.S file.

Hmm, are you sure it is required for release (not debug) version of library?

It exists in the libgrcrypt library version you are using

I do not know how to check this in a binary file. (I completely forgot everything about ASM and low-level programming...)

Is been linked correctly: something like config.status: linking mpi/amd64/mpih-lshift.S to mpi/mpih-lshift-asm.S.

Ok, I'll look at build logs.

And, if you manage to install libgrcrypt and build it, can you paste the build output?

Yes, sure:

tehnick commented 5 years ago

Also I see the additional patch for libgcrypt in MXE project: libgcrypt-1-fixes.patch.

tehnick commented 5 years ago

Also I see the additional patch for libgcrypt in MXE project: libgcrypt-1-fixes.patch.

Wow! I have found the root of problem: if I drop ATTR_ABI related part of patch then segfault is not happen.

@claucece Thanks a lot for your help!

Ri0n commented 5 years ago

Hola Sofía. I'm just curious. Is it you
https://mobile.twitter.com/i/web/status/985890984939671552 ?

I'm going to implement secure dtls based based file transfer for Psi. Do you want to join?

claucece commented 5 years ago

Wow! I have found the root of problem: if I drop ATTR_ABI related part of patch then segfault is not happen.

Nice @tehnick ! Everything working now?

claucece commented 5 years ago

Hola Sofía. I'm just curious. Is it you https://mobile.twitter.com/i/web/status/985890984939671552 ?

Yeah, that is me.

I'm going to implement secure dtls based based file transfer for Psi. Do you want to join ?

Interesting. I don't have much time; but if can collaborate on something, let me know. I'll be definetly interested in joining.

Ri0n commented 5 years ago

Interesting. I don't have much time; but if can collaborate on something, let me know. I'll be definetly interested in joining.

Yeah. Time is always a problem when we are talking about free projects :) Btw, applying otv4 would nice too. If you could investigate a possibility of migration to the new protocol, that would be great. Probably we could propose required changed to XSF then. Btw do you speak Spanish tambien?

Anyway join our xmpp conference at psi-dev@conference.jabber.ru

tehnick commented 5 years ago

Nice @tehnick ! Everything working now?

Yes. I'll close this issue once my PR in MXE will be merged.

Neustradamus commented 5 years ago

Thanks a lot to @claucece for this help about this "big" bug in libgcrypt from @mxe and not only in @psi-im / @psi-plus.

And @tehnick for all tests requested since a very long time (more one year), and for changes in the @mxe libgcrypt-1-fixes.patch.

@ri0n I have already requested for OTRv4 in Psi (in more than OTRv3 of course) ah ah For have not only the Pidgin client: https://github.com/otrv4/.

claucece commented 5 years ago

Yeah. Time is always a problem when we are talking about free projects :) Btw, applying otv4 would nice too. If you could investigate a possibility of migration to the new protocol, that would be great. Probably we could propose required changed to XSF then.

Yes! That would we awesome! Once we are done with the design of OTRv4, we can start implementing :) I'll check what is needed.

Btw do you speak Spanish tambien?

Pero, claro! :)

Anyway join our xmpp conference at psi-dev@conference.jabber.ru

I will :)

claucece commented 5 years ago

Thanks a lot to @claucece for this help about this "big" bug in libgcrypt from @mxe and not only in @psi-im / @psi-plus.

Np :)

@Ri0n I have already requested for OTRv4 in Psi (in more than OTRv3 of course) ah ah For have not only the Pidgin client: https://github.com/otrv4/.

Yes! :)