tmo1 / sms-ie

SMS Import / Export is a simple Android app that imports and exports SMS and MMS messages, call logs, and contacts from and to JSON / NDJSON files.
GNU General Public License v3.0
312 stars 38 forks source link

Support group chats #159

Open przemoc opened 5 months ago

przemoc commented 5 months ago

I use Google Messages with following configuration Settings > Advanced > Group Messaging > Send an SMS reply to all recipients and get individual replies (mass text)

Such group chat (and corresponding individual messages) look like this in messages.ndjson exported in v2.3.1:

{"_id":"11038","thread_id":"155","address":"+48aaaaaaaaa +48bbbbbbbbb +48ccccccccc ddddddddd","date":"1695928662181","date_sent":"0","read":"1","status":"0","type":"2","body":"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostru","locked":"0","error_code":"0","seen":"1","timed":"0","deleted":"0","sync_state":"0","marker":"0","bind_id":"0","mx_status":"0","out_time":"0","sim_id":"1","block_type":"0","advanced_seen":"3","b2c_ttl":"0","fake_cell_type":"0","url_risky_type":"0","favorite_date":"0","sub_id":"1","__display_name":"Person d"}
{"_id":"11041","thread_id":"3","address":"+48ccccccccc","date":"1695928662180","date_sent":"1695928665179","read":"1","status":"0","type":"2","body":"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostru","locked":"0","error_code":"0","seen":"1","timed":"0","deleted":"0","sync_state":"0","marker":"0","bind_id":"0","mx_status":"0","out_time":"0","sim_id":"1","block_type":"0","advanced_seen":"3","b2c_ttl":"0","fake_cell_type":"0","url_risky_type":"0","favorite_date":"0","sub_id":"1","__display_name":"Person c"}
{"_id":"11040","thread_id":"11","address":"+48bbbbbbbbb","date":"1695928662180","date_sent":"1695928667041","read":"1","status":"0","type":"2","body":"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostru","locked":"0","error_code":"0","seen":"1","timed":"0","deleted":"0","sync_state":"0","marker":"0","bind_id":"0","mx_status":"0","out_time":"0","sim_id":"1","block_type":"0","advanced_seen":"3","b2c_ttl":"0","fake_cell_type":"0","url_risky_type":"0","favorite_date":"0","sub_id":"1","__display_name":"Person b"}
{"_id":"11039","thread_id":"22","address":"+48aaaaaaaaa","date":"1695928662180","date_sent":"1695928665843","read":"1","status":"0","type":"2","body":"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostru","locked":"0","error_code":"0","seen":"1","timed":"0","deleted":"0","sync_state":"0","marker":"0","bind_id":"0","mx_status":"0","out_time":"0","sim_id":"1","block_type":"0","advanced_seen":"3","b2c_ttl":"0","fake_cell_type":"0","url_risky_type":"0","favorite_date":"0","sub_id":"1","__display_name":"Person a"}
{"_id":"11042","thread_id":"41","address":"ddddddddd","date":"1695928662180","date_sent":"1695928665985","read":"1","status":"0","type":"2","body":"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostru","locked":"0","error_code":"0","seen":"1","timed":"0","deleted":"0","sync_state":"0","marker":"0","bind_id":"0","mx_status":"0","out_time":"0","sim_id":"1","block_type":"0","advanced_seen":"3","b2c_ttl":"0","fake_cell_type":"0","url_risky_type":"0","favorite_date":"0","sub_id":"1","__display_name":"Person d"}

When imported, group chat doesn't look like a group chat, but like an individual chat with person whose name is concatenation of those phone numbers:

+48aaaaaaaaa+48bbbbbbbbb+48cccccccccddddddddd

(EDIT: To clarify, respective individual chats contain the message, as it is apparently stored for each of them individually too, so no issues with individual chats.)

I suspect that group chat may require some special handling on import, export or maybe in both cases.


Background:

Thanks for creating this application! I'm fan of SMS Backup+, but it doesn't handle MMSes properly, so when switching phones and restoring them, I were always losing them in respective conversation threads (at least they were still stored in mail most of the time, so it was somewhat acceptable). I hoped to avoid losing anything on target phone this time, so I gave your app a try.

I've used the latest SMS Import / Export, which is at v2.3.1 right now. First I exported messages from Android 11 phone (w/ Google Messages version messages.android_20240123_01_RC02.phone_dynamic). Then I imported messages to Android 14 phone (w/ Google Messages version messages.android_20240116_01_RC04.phone_dynamic). I'm mentioning Android (and Messages) versions just in case, not sure if they are relevant here at all. At first I thought it didn't work at all, because after finishing import, Messages didn't show anything, but after stopping the app, clearing its storage and running Messages again, messages were finally available! I've started scrolling to the bottom, and it seemed that chats from few years back were not showing, Messages displayed some placeholder at the bottom and some loading animations were happening. It felt like it got stuck, but eventually after several seconds, or maybe even a minute, more messages loaded, then I scrolled more and situation repeated a few times, until I reached to the very first message that was stored on the old phone. Hard to tell if everything imported properly (due to sheer volume, many thousands of messages spanning over 10 years), but quickly glancing at stuff looked like things looked good, and MMSes were in converation threads. But I noticed one problem.

tmo1 commented 4 months ago

Thanks for the report; I'm looking into it.

tmo1 commented 4 months ago

When imported, group chat doesn't look like a group chat, but like an individual chat with person whose name is concatenation of those phone numbers:

I have confirmed that the app does not handle SMS message with multiple recipients correctly.

I suspect that group chat may require some special handling on import, export or maybe in both cases.

It looks like it. Unfortunately, I have not yet been able to figure out how to do so.

after finishing import, Messages didn't show anything, but after stopping the app, clearing its storage and running Messages again, messages were finally available! I've started scrolling to the bottom, and it seemed that chats from few years back were not showing, Messages displayed some placeholder at the bottom and some loading animations were happening. It felt like it got stuck, but eventually after several seconds, or maybe even a minute, more messages loaded, then I scrolled more and situation repeated a few times, until I reached to the very first message that was stored on the old phone.

This is documented in the README.

przemoc commented 4 months ago

I have confirmed that the app does not handle SMS message with multiple recipients correctly.

It looks like it. Unfortunately, I have not yet been able to figure out how to do so.

Thank you for looking into that.

Would it be possible to add support for ignoring those group chats on import? (I mean only group ones, individual ones should not be skipped, because same messages can be found in them too.)

This is documented in the README.

The behavior with Google Messages taking very long time to load older messages was not described. I was on the verge of thinking that something was broken after import. So decided to describe my experience, so that others trying to google it would be able to find it.

tmo1 commented 4 months ago

Would it be possible to add support for ignoring those group chats on import? (I mean only group ones, individual ones should not be skipped, because same messages can be found in them too.)

Can you explain more clearly what you mean by this?

przemoc commented 4 months ago

As I have shown in my first message in the issue, messages send through group chat (e.g. to +48aaaaaaaaa & +48bbbbbbbbb) appear also in individual chats (separately w/ +48aaaaaaaaa, and separately w/ +48bbbbbbbbb) when exported. Preserving group chats would be preferable for seamless transition between phones, but even if they're not imported, messages are not lost and still accessible (which is the most important feature of any backup & restore solution), and if one would like to send message to the same group chat that was previously created on other device, in the worst case it can be recreated manually for new messages.

So what I'm suggesting is having an option for import to ignore/skip group chats, i.e. entries with multiple numbers in address. (And only for import, because export seems ok-ish as far as I can tell, so it's good to have group chats exported even if it cannot be imported properly atm.) Having broken group chats and having to remove them manually is somewhat inconvenient.

When a way to properly import group chats will be found, then such option could be no longer needed (or it could be left there, maybe some folks would prefer it, or there could be even a counterpart option to simply import only group chats for those that had to skip it before).

Does it sound reasonable?

tmo1 commented 4 months ago

It turns out that in my original testing of multiple recipient SMS messages, I was making a mistake and not creating them properly, and that's why I was not seeing what you were seeing. Now that I'm doing it correctly, I'm seeing results similar to yours, with the difference that for your sample message, I see what you describe, whereas with messages that I created, the message with the recipients concatenated is placed in the same thread as the message to the final recipien. I.e., a message to num1, num2, and num3 exports as four messages, as per your example, but upon import, I get three conversations containing four messages, with the message to "num1 num2 num3" combined in the same conversation as the one to num3. (I think the combination was always with the final recipient, although I'm not totally sure.)

In any event, after much experimenting and searching, I have still been unable to figure out how to properly import such messages, so I've followed your suggestion and modified the app to skip them. I'm leaving this issue open in the hope that we'll eventually figure out the correct way to handle such messages.

przemoc commented 3 months ago

Thank you for the effort. I will try to do new export + import in upcoming days and test if skipping works as expected.

przemoc commented 3 months ago

I tested v2.3.2 today. Group chats have been skipped during import, so there are no entries with broken recipient (seen as concatenated numbers previously). That's a good workaround until proper solution will be found.

przemoc commented 3 weeks ago

I recently tested importing again (before 2.4.0 was released) and noticed one problem, but was on the go and not able to share it earlier, so sorry for that.

Current workaround for group chats, which is simply skipping them on import: https://github.com/tmo1/sms-ie/blob/92da6e66a879e3b9b1307629aac305b72da7bf49/app/src/main/java/com/github/tmo1/sms_ie/ImportExportMessages.kt#L391--L396 that was added in 2f0707481c introduced regression.

Existing logic to detect group chats by having more than one word in address field is unfortunately insufficient, it leads to skipping some normal messages too.

You may have messages sent by phone services providers, government agencies, etc. that do not use numbers but names instead. For instance in Poland we receive occassionally messages from ALERT RCB (providing alerts from Government Centre for Security). Such messages are lost upon import now if there is a space in Sender ID.

I feel like detection for group chat messages needs to be more elaborate. Basically only messages where all words in address field are true phone numbers should be considered as such. If even one word is something else than phone number, then it should not be treated as group chat message.

I am not aware of how phone numbers are stored internally in Android across the world, but I would expect them to optionally start with plus sign (+) and have non-zero number of digits (0-9), which could be possibly written as following Java RegExp: \A\+?\d+\Z (assuming matching would be done per word) But maybe more non-digit characters are allowed beside plus sign, haven't researched that.

Alternatively one could try to simply detect characters other than plus sign, digits and space in address to bail out from group chat logic, but it feels like with this simpler approach some corner cases could be missed, like non number sender like +505+.

chenxiaolong commented 3 weeks ago

Basically only messages where all words in address field are true phone numbers should be considered as such.

It may be worth trying https://developer.android.com/reference/android/telephony/PhoneNumberUtils#formatNumber(java.lang.String,%20java.lang.String) to determine if a number is valid. (It returns null for invalid phone numbers.) I don't know how strict/lenient it is, but it might be easier than coming up with a regex.


As for the underlying issue, I wonder if it makes more sense to do something similar to Android's builtin TelephonyBackupAgent.

On export, for each thread, it grabs the Telephony.Threads.RECIPIENT_IDS, which is a space separated string of longs. It then does one call to query for phone numbers for all of these IDs. The list of recipients is stored as an array in their JSON export.

On import, it reads the JSON list of recipients and always imports them as a collection (vs. a string) using the Set<String> overload of Telephony.Threads.getOrCreateThreadId().

This same procedure is used for both SMS and MMS messages.

It still imports and exports Telephony.Sms.ADDRESS and Telephony.Mms.Addr.ADDRESS, but never parses them. Only Telephony.Threads.RECIPIENT_IDS is used to determine the recipients for threads.

tmo1 commented 3 weeks ago

As for the underlying issue, I wonder if it makes more sense to do something similar to Android's builtin TelephonyBackupAgent.

No doubt it does - thanks for trawling the code to find this ...

On import, it reads the JSON list of recipients and always imports them as a collection (vs. a string) using the Set overload of Telephony.Threads.getOrCreateThreadId().

Hm. I was pretty sure that I had tried using the Set<String> form of Telephony.Threads.getOrCreateThreadId() to import group SMS message and that it hadn't worked, but I just tried it by passing in Telephony.Sms.ADDRESS.split(" ").toSet() from a group SMS message and it actually seems to work.

We still have the problem raised by @przemoc of distinguishing between address fields that contain multiple recipients and those that contain recipients that contain spaces. I suppose you're right that the ultimately correct solution is to store during export each message's recipients, obtained via the Telephony.Threads.RECIPIENT_IDS to addresses query you've found, and then to use those values upon import, but this would break backward compatibilty and so would have to be done very deliberately.

The real frustration here is that I look at messages as being primarily standalone objects, without any native concept of threads or conversations: IIUC, when Alice sends a message to Bob, that message as sent over the wire contains no information about any other, previous messages sent between them. Android, however, imposes its own notions of threads and conversations upon messages, and makes them an integral part of its message storage model. My app tries to untangle the messages from their Android-added structure, but inevitable runs into problems such as the ones in this issue.

Do I understand this correctly? Do you have a recommendation?

chenxiaolong commented 3 weeks ago

No problem!

I think ultimately, storing the array of addresses queried from RECIPIENT_IDS is probably the way to go. Compatibility with the current format (new app + old backup) could probably be done on a best effort basis, like with @przemoc's suggestion of splitting on whitespace and attempting to parse each as a phone number. I'm not sure about the old app + new backup scenario though.

The real frustration here is that I look at messages as being primarily standalone objects, without any native concept of threads or conversations: IIUC, when Alice sends a message to Bob, that message as sent over the wire contains no information about any other, previous messages sent between them. Android, however, imposes its own notions of threads and conversations upon messages, and makes them an integral part of its message storage model. My app tries to untangle the messages from their Android-added structure, but inevitable runs into problems such as the ones in this issue.

Yeah, this is indeed frustrating. I personally like the way sms-ie represents the messages. I'd imagine that since Android's built-in backups use a similar data model to yours, they wouldn't introduce anything that further complicates translation to and from Android's data model (hopefully...).

przemoc commented 3 weeks ago

The real frustration here is that I look at messages as being primarily standalone objects, without any native concept of threads or conversations: IIUC, when Alice sends a message to Bob, that message as sent over the wire contains no information about any other, previous messages sent between them. Android, however, imposes its own notions of threads and conversations upon messages, and makes them an integral part of its message storage model. My app tries to untangle the messages from their Android-added structure, but inevitable runs into problems such as the ones in this issue.

SMS have been invented as 1-to-1 messages, but reality is they were often used as 1-to-n messages in practice, so Android providing a way to create convenient groups where message will be sent to many recipient is actually quite useful feature IMHO that I used for quite some time. If you have some groups of friends that for instance you try to meet up with from time to time, then you do not have to copy-paste the message and send it individually to all recipients, you can reuse already existing group chat to broadcast necessary messages. Of course incoming responses are only in individual senders's threads, because there is no way to tell if they were sent also to others, so such behavior makes sense. Even though you may don't like practical aspect of this UX improvement, I assure you it is an improvement. And not everyone is using WhatsApp or Messenger (I don't, as I try to be not involved with most of Meta services).

Even if there will be means to support adding group chats and you will not be willing to add it in sms-ie, then at the very least issue of properly identifying group chat messages from the rest remains (just for skipping purposes, if nothing else).

I think ultimately, storing the array of addresses queried from RECIPIENT_IDS is probably the way to go.

I agree. Relying on lossy address representation (as a string), even though it may contain more than one address that are separated by a space ( ), will always lead to issues when trying to restore real recipients list, as you can only approximate recipients list. It's still good to know that there is a way to obtain list of recipients in a not lossy way, thanks for the investigation, @chenxiaolong.

Compatibility with the current format (new app + old backup) could probably be done on a best effort basis, like with @przemoc's suggestion of splitting on whitespace and attempting to parse each as a phone number. I'm not sure about the old app + new backup scenario though.

Properly parsing numbers may be harder than I thought.

I naively thought that one phone number would be always one word, I mean that storing uses concise and complete (information-wise) form that is decoupled from how users typically represent phone numbers in their countries (grouping some parts, etc.). You know, just like you should never store time in local time zone and instead store it in UTC (e.g. as Epoch time) along with TZ difference, and use it for representation in UI layer however fits the app or its settings chosen by user preferences (showing time in UTC or local TZ, showing TZ diff if in local TZ, etc.), otherwise various issues later (DST and whatnot) will be making your life harder, and good engineering is about avoiding hurdles that are avoidable.

Yet Neob91's message indicates that even single phone number can be stored in divided form as multi-word, which is saddening if it's really true. It would mean that even with help of PhoneNumberUtils.formatNumber one would need to try not only one word but many words, and possibly even start from many words first before trying less (so that +48 123 456 789 wouldn't end up as two separate numbers).

Therefore relying on recipients list is seemingly the only proper solution. Having best effort approximations before that (some improvements are undoubtedly needed over what we have today, as just checking for many words to distinguish group chat messages leads to too many false positives) will need to be also expanded with warning messages to the users, something like:

[!Warning] Current sms-ie backup format unfortunately does not contain sufficient information to be able to always properly differentiate group chat messages from singular messages, therefore there may be mistakes upon import.

And hopefully for 3.0 it could be fixed (of course only for 3.0 backups, imports from 2.x always will remain flawed).


Please mind that there are also MMSes with mutliple recipients. I do not use them, but they exist, and if I'm not mistaken MMS have multiple recipients support baked into it, so phone does not have to re-send same message to many recipients like it is in case of SMS (at least that's my understanding), and MMS recipients can actually see list of recipients, so they can reply to all too.

tmo1 commented 3 weeks ago

I think ultimately, storing the array of addresses queried from RECIPIENT_IDS is probably the way to go. Compatibility with the current format (new app + old backup) could probably be done on a best effort basis, like with @przemoc's suggestion of splitting on whitespace and attempting to parse each as a phone number. I'm not sure about the old app + new backup scenario though.

That's largely what I was worried about - new app / old format is easy enough to solve, either with in-app logic or via an external converter script, but old app / new format would likely introduce problems with implicit or explicit assumptions in the old app about the format of the import JSON.

I'll probably have to start working on a v3 format, though, in order to get this right.

I personally like the way sms-ie represents the messages.

Thanks! I'm glad my conception isn't totally off base.

SMS have been invented as 1-to-1 messages, but reality is they were often used as 1-to-n messages in practice, so Android providing a way to create convenient groups where message will be sent to many recipient is actually quite useful feature IMHO that I used for quite some time. If you have some groups of friends that for instance you try to meet up with from time to time, then you do not have to copy-paste the message and send it individually to all recipients, you can reuse already existing group chat to broadcast necessary messages. Of course incoming responses are only in individual senders's threads, because there is no way to tell if they were sent also to others, so such behavior makes sense. Even though you may don't like practical aspect of this UX improvement, I assure you it is an improvement. And not everyone is using WhatsApp or Messenger (I don't, as I try to be not involved with most of Meta services).

I understand all this and acknowledge the use case for group SMS messaging - but Android could have chosen to handle this separately from its storage of the messages themselves insofar as it's introducing new structure not present in the those messages. I'm very happy to have support for group SMS in the UX - but it would make SMS I/E development easier if it stayed in the UX and in a separate datastore, unentangled with the storage of the messages themselves.

Even if there will be means to support adding group chats and you will not be willing to add it in sms-ie, then at the very least issue of properly identifying group chat messages from the rest remains (just for skipping purposes, if nothing else).

Agreed.

Therefore relying on recipients list is seemingly the only proper solution. Having best effort approximations before that (some improvements are undoubtedly needed over what we have today, as just checking for many words to distinguish group chat messages leads to too many false positives) will need to be also expanded with warning messages to the users, something like:

Warning ...

And hopefully for 3.0 it could be fixed (of course only for 3.0 backups, imports from 2.x always will remain flawed).

Agreed again.

Please mind that there are also MMSes with mutliple recipients. I do not use them, but they exist, and if I'm not mistaken MMS have multiple recipients support baked into it, so phone does not have to re-send same message to many recipients like it is in case of SMS (at least that's my understanding), and MMS recipients can actually see list of recipients, so they can reply to all too.

To the best of my knowledge, MMS messages with multiple recipients have always been handled correctly by SMS I/E. You are correct that the MMS structure (or at least its Android system representation) natively supports multiple recipients - in fact, Android doesn't store the recipients in the basic MMS metadata table (Telephony.BaseMmsColumns) at all, so we've always been doing a relational query to collect the recipients, and have always stored them in an array within the JSON. It looks like we'll be doing something similar for SMS in the next format update (and possibly updating the method of obtaining them to the official Anfroid way as suggested by @chenxiaolong, in order to increase the likelihood of doing it correctly and to unify the process between SMS and MMS messages).

tmo1 commented 2 weeks ago

I may have come up with a method of handling group SMS messages in a fairly correct and robust manner even within the limits of the current export format. As @przemoc has reported and as I have confirmed in my testing, Android apparently stores group SMS messages as both group messages with multiple recipients as well as individual messages to each of the recipients. It should therefore be possible to reliably determine the recipients of a group SMS message by correlating it with its associated single recipient versions.

It is infeasible to do this in-app, due to the fundamental architectural limitation of the app's sequential, line-at-a-time reading of the import JSON, but it should be possible to implement the procedure via an external helper script combined with a fairly small change to the app itself.

The helper script would do the following: For every outgoing SMS message (M) whose address contains at least one space, find all other outgoing messages that have the same timestamp and body text and an address that is a proper subset of M's address (M1, M2, M3, ...). If we find more than one, then we can assume with fairly high confidence that M is a group SMS and that the various Mx are its associated single recipient versions. In this case, replace M's address with a concatenation of the addresses of all the Mx separated by some separator that is unlikely to ever naturally occur in a message address, e.g., a double pipe ("||").

In the app, revert the current check for addresses containing spaces and instead check for our separator. For any message whose address contains at least one separator, split the address upon the separator and then feed the resulting addresses to the Set<String> version of Telephony.Threads.getOrCreateThreadId().

Thoughts?

chenxiaolong commented 2 weeks ago

If we want to keep the current backup format, then that approach seems reasonable. After matching by timestamp and message body, I'd imagine it'd be extremely rare for an address to be the subset of another.

I did a bit of digging in the SMS specs. 3GPP TS 23.040 section 9.1.2.5 lists the supported address types. For the alphanumeric type, it references 3GPP TS 23.038 section 6.2.1, which lists the valid characters. | doesn't look like a permitted character so it's probably safe to use as a separator.

tmo1 commented 2 weeks ago

If we want to keep the current backup format, then that approach seems reasonable.

I agree that the better approach is to upgrade the format, but I'm still inclined to implement this fix for exports in the current format.

| doesn't look like a permitted character so it's probably safe to use as a separator.

Good to know - thanks!

angelsl commented 1 week ago

The (basically silent unless you look at logcat!) dropping of messages from sender IDs that have a space in them was causing me trouble so I quickly hacked together a solution for myself based on what chenxiaolong suggested/showed: https://github.com/angelsl/sms-ie/commit/ed7cf6424cbfaf14dca3d68b389bfbfb6e3f0930

I stored the canonical recipient addresses in a separate __recipients array per message on export, and then on import we get the thread using that array if that array is present, otherwise fall back to the address column.

Some things to note:

If you like the approach, please feel free to just steal the change and clean up the code a bit. (It's my first time writing Kotlin so it might not be very idiomatic—sorry.)