status-im / status-mobile

a free (libre) open source, mobile OS for Ethereum
https://status.app
Mozilla Public License 2.0
3.89k stars 985 forks source link

Part of communities are not synced after syncing with Desktop #21303

Open pavloburykh opened 1 week ago

pavloburykh commented 1 week ago

Note: bug might not be reproducible every time.

Steps:

  1. Desktop Device joins a number of communities. In my case it was Status community and two additional ones
  2. Perform Syncing of Mobile device with Desktop
  3. See if all of the communities present on Mobile device after sync is finished

Actual result: in my case only 2 of 3 communities have been synced.

Desktop_geth.log.zip Mobile_synced.zip

https://github.com/user-attachments/assets/123261fc-e07b-4b56-ba73-83a7b9e77e22

Screenshot from Desktop:

sync desktop

Additional Information

churik commented 1 week ago

@Samyoul would appreciate if you can help to understand the reason behind this issue - network / status-go / waku / something else?

We got this bug from user, and I believe it will happen more if people start to use sync

churik commented 1 week ago

Confirming: only 2/4 communities were synced

Sync direction: scanned QR from desktop.

photo_2024-09-23 12 23 03 jpeg 2024-09-23 12-23-40

Logs

desktop - geth_1.log Mobile - logs (9).zip

ilmotta commented 1 week ago

FYI: @qfrank will investigate the root cause of this issue. Thanks @qfrank 💯

qfrank commented 5 days ago
DEBUG[09-20|16:00:42.900|github.com/status-im/status-go/protocol/messenger_communities.go:3817]                                              m.handleCommunityDescription error       error="signer can't be nil"
ERROR[09-20|16:00:42.900|github.com/status-im/status-go/protocol/messenger_sync_raw_messages.go:121]                                         failed to handleSyncCommunity when HandleSyncRawMessages error="signer can't be nil"

Key error information. Part of communities not being synced is all because of this.

qfrank commented 5 days ago
    // This is our own message, so we can trust the set community owner
    // This is good to do so that we don't have to queue all the actions done after the handled community description.
    // `signer` is `communityID` for a community with no owner token and `owner public key` otherwise
    signer, err := utils.RecoverKey(&amm)
    if err != nil {
        logger.Debug("failed to recover community description signer", zap.Error(err))
        return err
    }

err is nil and signer is also nil, maybe @igor-sirotin know something on this?

qfrank commented 5 days ago
func (o *Community) toProtocolMessageBytes() ([]byte, error) {
    // If we are not a control node, use the received serialized version
    if !o.IsControlNode() {
        // This should not happen, as we can only serialize on our side if we
        // created the community
        if len(o.config.CommunityDescriptionProtocolMessage) == 0 {
            return nil, ErrNotControlNode
        }

        return o.config.CommunityDescriptionProtocolMessage, nil
    }

    // serialize
    payload, err := o.marshaledDescription()
    if err != nil {
        return nil, err
    }

    // sign
    return protocol.WrapMessageV1(payload, protobuf.ApplicationMetadataMessage_COMMUNITY_DESCRIPTION, o.config.PrivateKey)
}

The communities that not synced are not control node? @churik cc @pavloburykh

igor-sirotin commented 5 days ago

Hmm, dunno really. I looked at the logs and the code, but can't see the answer quickly. There was an issue that I fixed some time ago, it seem to be related, but I don't understand how exactly: https://github.com/status-im/status-go/pull/4405 🤷

qfrank commented 5 days ago

Hmm, dunno really. I looked at the logs and the code, but can't see the answer quickly. There was an issue that I fixed some time ago, it seem to be related, but I don't understand how exactly: status-im/status-go#4405 🤷

maybe @osmaczko knows something? 🤔

churik commented 5 days ago

@igor-sirotin in all communities the user was just a member.

osmaczko commented 5 days ago
  // This is our own message, so we can trust the set community owner
  // This is good to do so that we don't have to queue all the actions done after the handled community description.
  // `signer` is `communityID` for a community with no owner token and `owner public key` otherwise
  signer, err := utils.RecoverKey(&amm)
  if err != nil {
      logger.Debug("failed to recover community description signer", zap.Error(err))
      return err
  }

err is nil and signer is also nil, maybe @igor-sirotin know something on this?

It seems community description message is stored without signature for ordinary members for some reason.

I think I found the culprit: https://github.com/status-im/status-go/blob/d794e433476db879f34893a61254bd501abdabb2/protocol/communities/manager.go#L3511-L3519. When we get a request to join response from control node we artificially create not signed ApplicationMetadataMessage, which is very wrong and in consequence leads to the issue of syncing not signed CommunityDescription message.

I think the best way to go forward is to:

--- a/protocol/protobuf/communities.proto
+++ b/protocol/protobuf/communities.proto
@@ -199,14 +199,16 @@ message CommunityUserKicked {
 }

 message CommunityRequestToJoinResponse {
+  reserved 2;
   uint64 clock = 1;
-  CommunityDescription community = 2;
   bool accepted = 3;
   bytes grant = 4;
   bytes community_id = 5;
   string magnet_uri = 6;
   bytes protected_topic_private_key = 7;
   Shard shard = 8;
+  // CommunityDescription with owner signature
+  bytes community = 9;
 }

and ignore whole CommunityDescription handling inside HandleCommunityRequestToJoinResponse if it is empty. The compatibility is not a concern here because if we ignore the community field from older clients, nothing serious will happen, given that the ordinary CommunityDescription will eventually arrive at our end. Nevertheless, I would double-check the behavior with these changes applied, especially the scenario where an old client is accepting a request to join from a new client and the opposite.

osmaczko commented 5 days ago

The compatibility is not a concern here because if we ignore the community field from older clients, nothing serious will happen, given that the ordinary CommunityDescription will eventually arrive at our end.

After second thought, there is a compatibility concern. We should still populate deprecated community field, otherwise old clients will not be able to process request to join responses from new clients:

--- a/protocol/protobuf/communities.proto
+++ b/protocol/protobuf/communities.proto
@@ -200,13 +200,15 @@ message CommunityUserKicked {

 message CommunityRequestToJoinResponse {
   uint64 clock = 1;
-  CommunityDescription community = 2;
+  CommunityDescription community = 2 [deprecated = true];
   bool accepted = 3;
   bytes grant = 4;
   bytes community_id = 5;
   string magnet_uri = 6;
   bytes protected_topic_private_key = 7;
   Shard shard = 8;
+  // CommunityDescription with owner signature
+  bytes community_description_message = 9;
 }
osmaczko commented 5 days ago

Hey @pavloburykh, can you please take a look if https://github.com/status-im/status-go/pull/5880 fixes the issue? In order for it to work for old accounts, communities must be re-joined.