spantaleev / matrix-docker-ansible-deploy

🐳 Matrix (An open network for secure, decentralized communication) server setup using Ansible and Docker
GNU Affero General Public License v3.0
4.78k stars 1.03k forks source link

Whatsapp bridge broken after migration #791

Open joao-p-marques opened 3 years ago

joao-p-marques commented 3 years ago

I just ran the playbook with the latest migration from sqlite to postgres.

The Whatsapp bridge stopped working after the update. In any operation I do (send message, sync contacts, ...) I get this error in the chat room and in the logs:

Jan 13 10:28:00 matrix matrix-mautrix-whatsapp[1020]: [Jan 13, 2021 10:28:00] [User/@MYUSER:MYSERVER/WARN] Failed to set presence: encryptBinaryMessage(node) failed: encrypt failed: crypto/aes: invalid key size 115
Jan 13 10:28:13 matrix matrix-mautrix-whatsapp[1020]: [Jan 13, 2021 10:28:13] [User/@MYUSER:MYSERVER/ERROR] WhatsApp error: error processing data: error decoding binary: invalid hmac

What can I do to fix this? Should I revert back to sqlite?

spantaleev commented 3 years ago

Looks like migrating the data to Postgres didn't work well or something.

Sometimes pgloader doesn't do the right thing. For some things (like Dimension), we had to apply fix-up SQL statements. See here: https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/21d3802ed76bfcd8598c62192ffc943a1e0381f7/roles/matrix-dimension/tasks/setup_install.yml#L12-L49

I wonder if we need to do the same for the Whatsapp bridge. I'm not using the bridge myself, nor have I heard of reports similar to yours, nor any obvious success stories with this migration. You may wish to come to our support room and see how the migration has went for others.

Perhaps there's only a problem with migrating the encryption tables, and not a generic problem with migrating everything else.

You can downgrade to SQLite for now, I guess.

joao-p-marques commented 3 years ago

Ok, I ended up reverting to SQLite and :+1: for the good guide on that :smiley:

Leaving it like that until this is fixed. Thanks!

spantaleev commented 3 years ago

It would be helpful if you could inspect both the Postgres and SQLite databases and see what the difference is (mostly for the tables related to encryption, I guess).

To access the SQLite database: sqlite3 /matrix/mautrx-whatsapp/data/mautrix-whatsapp.db (the binary may be named just sqlite depending on your distro. You may also need to install it manually). Then .tables to list tables, and .table TABLE_NAME_HERE to see its schema. You can run some SELECT * FROM TABLE_NAME_HERE; statements to see what's there.

To access the Postgres database:

joao-p-marques commented 3 years ago

Thanks @spantaleev

Straight away I can see that the user table has all the expected information in the sqlite DB. The schema is:

TABLE "user"  (
  mxid VARCHAR(255) PRIMARY KEY,
  jid  VARCHAR(255) UNIQUE,
  management_room VARCHAR(255),
  client_id    VARCHAR(255),
  client_token VARCHAR(255),
  server_token VARCHAR(255),
  enc_key      bytea,
  mac_key      bytea,
  last_connection BIGINT NOT NULL DEFAULT 0
)

and it has all the correct data.

However, on the PostgreSQL DB, I think the user table is from/conflicts with the Postgres users DB:

matrix_mautrix_whatsapp=# select * from user;
  user   
---------
 synapse
(1 row)

As the error comes from the hmac key, maybe that comes from the mac column.

schwarzeszeux commented 3 years ago

I'm facing a similar issue and I'll try to revert.

Your message may not have been bridged: could not send proto: encryptBinaryMessage(node) failed: encrypt failed: crypto/aes: invalid key size 119

For anyone else who failed to check the changelog, here's the link to the guide: https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/master/CHANGELOG.md#the-big-move-to-all-on-postgres-potentially-dangerous

I checked the database-schema and it looks mostly ok except that a lot of columns are not NOT NULL. Other things to mention:

However, what seems odd is how similar all encryption keys look inside of postgres.

I'm not sure if this is a conflict between public.user and user. When I edit a line in that table manually and then do a reconnect of the bot, the keys change. So at least the update is working...

spantaleev commented 3 years ago

After switching back to SQLite, can you:

schwarzeszeux commented 3 years ago

I'm not sure if it's something on my end, but I can't even get the bot to reconnect to whatsapp

But this is the error I'm getting now:

Unknown error while reconnecting: restore session challenge timed out

joao-p-marques commented 3 years ago

I'm not sure if it's something on my end, but I can't even get the bot to reconnect to whatsapp

Seems to me like it might be related. I haven't had the chance to check the new version, but will do as soon as possible and report here.

On Fri, Jan 15, 2021 at 14:07, schwarzeszeux notifications@github.com wrote:

I'm not sure if it's something on my end, but I can't even get the bot to reconnect to whatsapp

But this is the error I'm getting now:

Unknown error while reconnecting: restore session challenge timed out

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/791#issuecomment-761223429, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJJMDDWSBMYGLOKCZNPJYALS2C4BJANCNFSM4WASTH2A.

joao-p-marques commented 3 years ago

@spantaleev I have just tested

After switching back to SQLite, can you: delete the old and broken matrix_mautrix_whatsapp Postgres database like this: /usr/local/bin/matrix-postgres-cli DROP DATABASE matrix_mautrix_whatsapp; get rid of matrix_mautrix_whatsapp_database_engine: 'sqlite' from your vars.yml file and try rerunning the playbook again. 8549926 might have fixed the issue

and I still get the same error.

spantaleev commented 3 years ago

I wonder if this is related to the upgrade (some data getting lost/corrupted during the upgrade) or if it's some unrelated problem.

Perhaps @tulir could help point us in the right direction?