mautrix / signal

A Matrix-Signal puppeting bridge
GNU Affero General Public License v3.0
484 stars 74 forks source link

Support importing history from Signal chatrooms #1

Open anoadragon453 opened 3 years ago

anoadragon453 commented 3 years ago

Signal does not store message data on its server after messages have been passed to a device. This means that when a new device is registered, chat history is not synced.

The recommended method for transferring history is by producing an encrypted backup file from a device, transferring it and then importing it into another file.

It would be nice if the bridge could consume these files (or a set of files extracted from them) and import the history and users into Matrix.

https://github.com/pajowu/signal-backup-decode is a such a tool which can be used to extract message, user and media data from a Signal backup. I've used it on a 6GB backup and found it works as advertised. Message and user data are placed into an SQLite3 database, while media is dropped into a directory.

Looks like a PR that may change large parts of the output is currently close to merging, so I might recommend we wait before that lands before integrating it. It also means people will need access to a Rust compiler to use this tool - but I don't think this should be a blocker.

emorrp1 commented 3 years ago

That PR is now merged, so it would be possible to do before connecting to signald, but I imagine it would be easier to implement this after MSC2716 (Incrementally importing history into existing rooms).

provokateurin commented 1 year ago

Hi, since the MSC has been implemented in Synapse and is already used in other bridges, are there any plans to work on this feature?

Thatoo commented 1 year ago

That would be nice. It will allow users to be able to move from having their matrix server as a secondary device to the primary device and only one :-)

I personally don't do that now because I don't want to loose history of my rooms. But as soon as it is possible to import history, I will make my matrix account being my primary device and I'll drop one more app from my phone (Signal app).

Thatoo commented 1 year ago

Hi, since the MSC has been implemented in Synapse and is already used in other bridges, are there any plans to work on this feature?

@provokateurin which other bridges are using this https://github.com/matrix-org/matrix-doc/pull/2716 ?

provokateurin commented 1 year ago

For example whatsapp [1] or telegram [2] although it has not been released yet in telegram.

[1] https://github.com/mautrix/whatsapp/blob/a2bb46c22d6903fa14013aeb1ef2f4dce99e956c/CHANGELOG.md#v060-2022-07-16 [2] https://github.com/mautrix/telegram/blob/fb1568d019f9a98c49d1a51ff62de135f3a32c10/CHANGELOG.md#v0122-unreleased

Thatoo commented 1 year ago

I guess it could get inspiration from Telegram bridge as Signal bridge, like Telegram one are both written in Python. So I guess the inspiration could come from https://github.com/mautrix/telegram/pull/817 , am I right?

Thatoo commented 1 year ago

Would it require to add a command to the signal bot like !sg backfill at which command the bot would ask to send it the backup file from the signal app ? That would require to add https://github.com/pajowu/signal-backup-decode within the signal bridge then (easier for end users but big dependency) or would we ask the user to use this software on his/her own and send the output of it to the signal bot?

The problem is that this signal-backup-decode software doesn't seem much more develop anymore : https://github.com/pajowu/signal-backup-decode/graphs/code-frequency , https://github.com/pajowu/signal-backup-decode/graphs/contributors

provokateurin commented 1 year ago

Yeah I guess you could adapt the telegram changes to this bridge.

I think for backfilling in this case it would make sense to just backfill all the messages from a backup at once. You need to provide the whole backup anyway and there is not API requests overhead or something like that.

Thatoo commented 1 year ago

I don't understand, the .backup file provided by the signal app would need to be decoded before being used by the bridge to backfill so we need to use either https://github.com/pajowu/signal-backup-decode (written in rust) or https://github.com/xeals/signal-back (written in Go) even though both project didn't get any PR/merge request for more than 2 years. My question is if the user should provide the .backup file from the signal app or should provide a db file for message and a .zip or .tar file containing all media?

I guess it is easier to start with the bot asking for a db file and .tar file letting each user decoding their own .backup file provided by the signal app.

In https://github.com/mautrix/telegram/pull/817/files, I feel it import a db file (messages) but no medias file. I don't see how the user is supposed to upload the db file to the telegram bridge? Does the telegram bot has a specific command for that?

provokateurin commented 1 year ago

I don't think telegram uses backups, it just loads the data from the server.

sumnerevans commented 1 year ago

If somebody wants to do this, I think the best way would be to use the backfill model of the WhatsApp bridge. The way that WA works is that it gives you a full history sync when you log in. We then store all of those messages in the clear in the database before backfilling them to their corresponding portals.

Thatoo commented 1 year ago

The problem is that Mautrix-signal is written in Python whereas Mautrix-WA is written in Go. Isn't it a problem ?

An then the problem is that both these app store history on the server side and don't allow to backup in file. Whereas Signal allow to backup history in a file that the bridge would need to upload, read and then add to the database.

Thatoo commented 1 year ago

Well there is a way to transfert signal account directly from old primary device to new primary device so it could be that. Can signald do that?

Thatoo commented 1 year ago

Well I guess that we need to wait for signald to do it actually, no? https://gitlab.com/signald/signald/-/issues/70 https://gitlab.com/signald/signald/-/issues/142

Thatoo commented 1 year ago

So all the 19 person who did :+1: up to today on @anoadragon453 request, you should also do :+1: in here https://gitlab.com/signald/signald/-/issues/70 if you want this feature to ever become true.

hegdenischay commented 1 year ago

If somebody wants to do this, I think the best way would be to use the backfill model of the WhatsApp bridge. The way that WA works is that it gives you a full history sync when you log in. We then store all of those messages in the clear in the database before backfilling them to their corresponding portals.

I'm pretty sure Signal explicitly doesn't allow message history after you log into another device. Importing messages from a database sounds like a better idea.