guardianproject / ChatSecureAndroid

This project has ended, but ChatSecure iOS continues. For Android, please use Conversations or Zom instead
https://chatsecure.org/blog/chatsecure-conversations-zom/
Apache License 2.0
1.07k stars 521 forks source link

OTR failing intermittently #520

Open knoy opened 9 years ago

knoy commented 9 years ago

I'm experiencing crippling issues on 4.4.4 with 14.0.1 OTR against a server that had been previously working fine. Occasionally OTR will refuse to work with specific contacts even after rebooting the phone and repeatedly ending/starting new chats. The icon just spins forever and never responds

n8fr8 commented 9 years ago

This is happening for others, as well, and we will have a fix in the next update this week.

n8fr8 commented 9 years ago

One question - do you or any of these contacts use any other OTR chat programs like Pidgin or Adium?

knoy commented 9 years ago

No, they only use ChatSecure. One thing I've noticed is that after recovering from this bug (for seemingly no reason) it appears to show an unverified OTR fingerprint with the contact in question which seems to indicate that the OTR fingerprints have been corrupted in some way.

knoy commented 9 years ago

I would be willing to offer financial compensation if you could fix this bug within the next day.

n8fr8 commented 9 years ago

We've push an update to Google Play, or you can grab the latest release here: https://guardianproject.info/releases/ChatSecure-v14.0.3.apk

AizazZaidee commented 9 years ago

I have forked your project from GIT. Facing same problem. I am working on Master branch. On which branch you have fixed the bug?

n8fr8 commented 9 years ago

Pushing another update today. I think the problem relates to using the wrong JID. If we are foo@bar.com/blah but the server actually sets us to foo@bar.com/blah123 we were not getting that info and setting the setFrom() in the XMPP Message properly. This made OTR sessions not connect.

AizazZaidee commented 9 years ago

I think there is some problem in initialization of OTR. When you set it to fore and restart mobile It works but if you enable it in the middle then it's like taking forever. So I think there is something missing in the flow. Still you know far better than me.. but this is also a possibility.

n8fr8 commented 9 years ago

We've pushed changes that affect this bug, and have released 14.0.4 official as well: https://guardianproject.info/releases/ChatSecure-v14.0.4.apk

chaosjug commented 9 years ago

I still have this issue with 14.0.4

AizazZaidee commented 9 years ago

Just set it to forced, restart your mobile. Everything will work fine :)

knoy commented 9 years ago

This issue appears to be fixed with 14.0.4 and I haven't noticed any problems so far

EDIT: Further testing reveals that this has not been fixed! I'm intermittently unable to send OTRd chat to ANY of my contacts, this includes across between new versions of chatsecure, between new and old and between chatsecure and pidgin. Encryption is set to "automatically attempt". Force-stopping the app appears to fix the issue temporarily. PLEASE FIX THIS ISSUE AS SOON AS POSSIBLE

celocore commented 9 years ago

The issue appears not to be fixed :-/ ... I've testet it with ChatSecure 14.0.4 on Android 4.1.2, 4.2.2 and 4.4.4 on different devices. If I enable chat encryption the wheel is endless spinning and no encryption is enabled. All tricks named here are testet. I use OpenFire as XMPP-Server. With Xabber the OTR-Encryption works fine, so I think, it's not the server setup. Does anyone has other experiences with this constellation?

knoy commented 9 years ago

I have reverted everyone I know to <13 versions earlier to avoid this issue which is why I haven't noticed, but now that I've switched back this bug is definitely still here... this makes the app essentially unuseable as it's constantly failing to initiate OTR with contacts and nothing (rebooting, closing the app, etc) short of force stopping the app fixes this issue

eighthave commented 9 years ago

@n8fr8 says: "one source of OTR failing seems to be corruption of the keystore props file we use. a) not sure why the corruption happens and b) we have no good way to recover from it"

knoy commented 9 years ago

Perhaps it's realated to this bug: https://github.com/guardianproject/ChatSecureAndroid/issues/545

The OTR failing seems to happen nearly every time I use it within less than a minute. Easiest way to reproduce has been start an OTRd chat, end the chat, then start a chat again. This occurs between v14 and v14, v13 and v14 and v14 and pidgin.

BTW the setup: CS 14.0.4 (any version 14+ seems to be affected) Nexus 5 Android 4.4.4 WiFi or Cellular data connection

n8fr8 commented 9 years ago

The keystore file corruption is not related to the DB closing. The fact that you can establish an OTR session at means your keystore is intact. The OTR issues you are experiencing are more likely addressed by fixes in the SessionID instance management I have fixed in another pull request and will likely merge.

knoy commented 9 years ago

Awesome! Thank you so much! Let me know whenever you have a new build out and I will be happy to test it for you

knoy commented 9 years ago

I've been testing the latest builds (Oct 19th) among several people throughout the day and this issue is still present about 50% of the time OTR is attempted. Others who initially had some success reported not receiving many encrypted messages (despite delivery receipts on the senders end) during conversations and after resetting the database and logging back in OTR no longer worked at all (even after multiple attempts, force stops, etc). Rebooting the phone seemed to clear the issue. Others reported problems initializing OTR with anyone running v13 builds until force stopping. Please explain any steps you'd need to help further diagnosing this problem and I would be glad to assist

eighthave commented 9 years ago

I reproduced this using the latest code in git between ChatSecure and Gajim. OTR would not start at all, then I Force Quit ChatSecure, then it started working. Trying to reproduce again...

eighthave commented 9 years ago

I can reproduce it now:

  1. uninstall ChatSecure
  2. install ChatSecure
  3. setup existing XMPP account (e.g. gptest@jabber.org)
  4. attempt to start OTR conversation from Gajim (signed into gptest@jabber.ccc.de)

It looks like ChatSecure is trying to decrypt the initial OTR request, I don't know if that is the appropriate behavior or not:

       OtrChatListener  V  onIncomingMessage: ?OTRv2?_hc@jabber.ccc.de has r
        OtrChatManager  V  decryptMessage: gptest@jabber.org/ChatSecure gptest@jabber.ccc.de/Gajim ?OTRv2?_hc@jabber.ccc.de has r
n8fr8 commented 9 years ago

Glad you can repro, hc. I think a few issues are being conflated here though.

v13 and existing v14 code definitely has issue with OTR and XMPP resources that could be causing this. If you are testing, knoy, please just test between the latest builds on both ends, and not an old build on one end.

2nd, the issue of delivery receipts showing, even if OTR fails, is that we are showing the delivery receipt which means we did receive the message, but decryption fails, so the user does not see it. Perhaps we should only send the receipt for OTR sessions if decryption succeeds? That would be tricky because delivery receipts right now are a the XMPP layer. I think what you want is a "read receipt" in fact.

Otherwise, hc, we always call "decryptMessage" in all cases, because that is how we detect whitespace characters to trigger OTR session to start in "as requested" mode. If it is not an OTR message, the engine will just pass the plain text along.

n8fr8 commented 9 years ago

I do think, hc, that what you reproduced relates to the issue I have seen of this problem mostly happening on first-time use/setup. Once it stops happening, then it is fine from there on out. So it may be related to OTR key generation?

chrisballinger commented 9 years ago

Key generation can take a VERY long time on mobile devices, especially older ones. There was a bug in libotr that we had to work around related to async key generation causing the handshake to fail. Perhaps otr4j has a similar issue.

On Thu, Oct 23, 2014 at 5:10 PM, n8fr8 notifications@github.com wrote:

I do think, hc, that what you reproduced relates to the issue I have seen of this problem mostly happening on first-time use/setup. Once it stops happening, then it is fine from there on out. So it may be related to OTR key generation?

— Reply to this email directly or view it on GitHub https://github.com/guardianproject/ChatSecureAndroid/issues/520#issuecomment-60328281 .

knoy commented 9 years ago

I've been doing most of the testing on v14, the v14->v13 testing was just to ensure edge cases. In any event this does NOT appear to be a v13 bug as the OTR always works flawlessly on v13, it seems to be clearly a result of something added in < 14.0.1

This also does NOT appear to be a key generation bug as I have waited upwards of 8 minutes for the problem to resolve without success, and then immediately after force stopping and relaunching the application OTR initiates instantaneously.

The message receipts happened after I initiated an OTRed conversation, received several encrypted messages I could read and then sent several which were delivered but unreadable to my recipient, not sure if this is unrelated

knoy commented 9 years ago

This issue is still really crippling any use of chatsecure. I'm having troubles understanding why you are having problems reproducing this bug as it's affected every person I know who's updated to v14. Anytime the app is updated to a new version (i.e: a new debug build) all the active chats become instantly affected and reboots and force stops do not fix.

eighthave commented 9 years ago

@n8fr8 ok, following up on the XMPP resources, have OTR Instance Tags been implemented in v14? or is there something else happening with XMPP resources? I just had a session where ChatSecure was not receiving messages at all, and it seemed to be related to different XMPP Resources.

n8fr8 commented 9 years ago

While you say "works flawlessly on v13" others did not say that, so you see it is a matter of perspective. Absolute statements about what does or does not work are not helpful.

OTR instance tags are not implemented, as we are not using OTR v3.

XMPP resources could be a cause of this, specifically how the OTR Session object instances are tied to a SessionID instance, and how new sessions are instantiated when the resource changes.

n8fr8 commented 9 years ago

As a side note, since 14.0.8.1 I have not had any problems init'ing OTR sessions with any of the non-technical users I communicate with using ChatSecure on a daily basis.

eighthave commented 9 years ago

@n8fr8 I only see v14.0.4 in git, where is the code for the newer releases?

n8fr8 commented 9 years ago

oops, sorry brainfart from Orbot versioning. Too many v14s..

eighthave commented 9 years ago

as for v14.0.4, I just got an OTR fail just now, with all of my commits on top of v14.0.4

devrandom commented 9 years ago

In general, I recommend code reviews as a policy. This will reduce regressions and generally improve code quality.

In this particular example, I suspect that the feature where an OTR state is maintained per peer presence was broken at some point in v14.

Personally, I am using an older version for stability.

n8fr8 commented 9 years ago

Well, devrandom, you are really missing out then. :)

I agree more code reviews and more eyeballs on all of this are great. eighthave is here to help, and hope we can get you and Lior back on board more. It is never good just to have one person working on so much code, but that was the reality we were in do to time, resources, and more. I hope we can get v14 to the point where you can appreciate all the positive improvements in stability and usability. The fact that my brilliant but non-technical wife is a full time, always encrypted ChatSecure now is the only evidence I need that progress has been made.

n8fr8 commented 9 years ago

Regarding that last bit, there may be a server issue here, because most of the people I am communicating with that are non-technical, who are not having any problems are on GMail accounts. This adds some evidence to the fact the issue is related to XMPP resources/addressing and not OTR itself.

devrandom commented 9 years ago

I am always available for code reviews, and would be glad to be more involved in general.

Regarding the issue - if it was a regression in multi-presence OTR, it's dependent on how many OTR enabled devices each side has on at the same time. That said, there could be multiple things going on at once. It would be good to add debug logging...

n8fr8 commented 9 years ago

@knoy can you remind me what kind of server you are using? ejabberd, prosody? Which version?

knoy commented 9 years ago

Prosody 0.9.6 (latest stable version)

To reproduce the issue consistently: Open Upgrade to a new debug build from an old debug build, all your active chats should now have issues with OTR. Afterwards this will continue intermittently

n8fr8 commented 9 years ago

Just to make sure this is clear: when you upgrade you are killing all OTR active sessions in memory, since they are stored in only in flash/transient memory "RAM" and not storage memory (sdcard/"disk"). If the other end you are chatting with has an open session, that will now be stale, and they should stop and restart. That said, I upgrade my debug build all the time directly from my dev IDE, and cannot reproduce this issue. I have tried, trust me, I have!

knoy commented 9 years ago

Is there any way to have stale chats automatically closed when upgrading to new versions? Just to ensure that it's not the OTR issue causing this and also to fix it if isn't

eighthave commented 9 years ago

@knoy it is each client's job to close its end of the OTR session. Alice's XMPP client can only signal Bob's XMPP client that it is closing the OTR session. If Bob's does not act on that information, then there is not really anything that Alice's can do about that.

About your instructions to reproduce the issue, you seem to imply that the client should have active sessions when upgrading. If so, then I'd write that up like this:

  1. install v13
  2. start an OTR session with another ChatSecure v13
  3. upgrade to v14
  4. try to start an OTR session with that previous ChatSecure v13 user

Does that sound right? If not, please try breaking it out into steps like that

n8fr8 commented 9 years ago

When you upgrade an APK on Android, the app is killed without any warning. There is no onDestory() called afaik. This means we can't properly end the session, XMPP or OTR. It is more like a crash.

eighthave commented 9 years ago

Something weird is going on in ChatSecure with the XMPP Resource. I had an account logged in twice (chatSecure and gajim), then I used a different account on ChatSecure to invite to my contact list. ChatSecure showed the Accept prompt twice, and Gajim did not get it.

knoy commented 9 years ago

Perhaps it could detect when an upgrade's occured (compare context.getPackageManager().getPackageInfo() to a sharedPreference) and close previously open conversations.

The reproduction would go more like this:

  1. Install v13 or v14 or a debug build
  2. Start an OTR session with anyone
  3. Upgrade to a v14 build
  4. Try to start an OTR session with that previous user
devrandom commented 9 years ago

Could you try to replace step 3 with a Force Stop of the app? It's available from the app info when you long-click an app in the history list.

BTW OTR state is in memory only as n8fr8 said. So closing conversations after upgrade can't fix the issue.

On October 27, 2014 2:07:55 PM PDT, knoy notifications@github.com wrote:

Perhaps it could detect when an upgrade's occured (compare context.getPackageManager().getPackageInfo() to a sharedPreference) and close previously open conversations.

The reproduction would go more like this:

  1. Install v13 or v14 or a debug build
  2. Start an OTR session with anyone
  3. Upgrade to a v14 build
  4. Try to start an OTR session with that previous user

Reply to this email directly or view it on GitHub: https://github.com/guardianproject/ChatSecureAndroid/issues/520#issuecomment-60669459

knoy commented 9 years ago

Force stopping after initiating a conversation does not trigger the bug. Instead a force stop is usually what's required to fix it as otherwise the spinning OTR progress circle will never end

eighthave commented 9 years ago

I think I am getting closer. I think that ChatSecure is getting confused on whether an OTR session is started or stopped. I was once able to reproduce this when I sent "Stop Encryption" in ChatSecure, but I haven't gotten that one going again. I just reproduced it this time by doing this between Gajim and ChatSecure:

  1. In Gajim (gptest@limun.org) end all OTR sessions and close chat window, but stay logged in.
  2. Shutdown & Lock in ChatSecure
  3. Force Quit ChatSecure
  4. Clear data for ChatSecure
  5. launch ChatSecure
  6. set up existing account in ChatSecure (gptest@jabber.org)
  7. launch chat session in ChatSecure with Gajim
  8. send two text messages (these show up as secure in ChatSecure, but in Gajim there is no sign of the OTR session)
  9. send a text message from Gajim to ChatSecure (this triggered the OTR negotiation to be shown in Gajim. then I sent a bunch of messages from Gajim and it seemed to stay encrypted, but Gajim refreshed the OTR after each message).
  10. I then started a new session from Pidgin to ChatSecure, no messages got through on either side. Though the debug log showed that ChatSecure was sending stuff through OtrChatManager. Pidgin did have the correct XMPP Resource for ChatSecure, so this problem is probably not related to the XMPP Resource, but rather the OTR state in ChatSecure.

Here's a log file with some debug messages, but not quite enough to be fully useful: https://gist.github.com/eighthave/7a529585159a24051d64

eighthave commented 9 years ago

One more thing on this topic, would be be useful to send the "Finished" message if ChatSecure is quitting or being upgraded? Since the OTR state is maintained in memory, sending that message would reflect that ChatSecure is ditching all OTR state.

n8fr8 commented 9 years ago

Yes I think that is a good idea. Would also stop the problem where we have a remote contact with an orphaned OTR session, and they send a FINISHED message that we cannot decrypt, and then the local client sends a message back saying they couldn't decrypt, etc... the more we can avoid those states, the better.

knoy commented 9 years ago

Tested the new debug version of CS with a few people, this issue still isn't resolved.....

Let me know what needs to be done to test or debug this. I'm at the end of my wits here guys. Looks like sometimes this breaks when other people already have already established OTR chats and your client thinks they dont, then when attempting to establish a 'new' conversation this breaks. Also as previously describes breaks on updating with all currently open chats, force stop seems to be the only way to stop it.

I'm experiencing multiple other bugs that are breaking things in other areas but I'm not even going to file new issues until this is resolved as this is really the priority for me.

Not trying to sound like an asshole but maybe the best course of action is to make sure the new pull requests actually fix the issue before commiting them as fixes. Whatever was done to break this, it was after version 13.0.4