itsjunetime / smserver

An app to allow Jailbroken iPhones to send texts & attachments from their browser
GNU General Public License v3.0
158 stars 27 forks source link

display the numbers with the resolved (or not) names #9

Closed blanxd closed 4 years ago

blanxd commented 4 years ago

Might be nice, in the web interface, to also show the actual number (or sometimes it's a string) with each name, like maybe in brackets or so, even if there are results from the ABPersonFullTextSearch_content query. The thing is, taking the 1st row from that %parsed_num% query isn't always the correct one, especially if there are several rows returned. So unless there are plans to make the process more complicated (with every possible fallback I cannot even imagine :), at least we'd see the actual correct number there... (public version 0.3.2 at time of writing this)

itsjunetime commented 4 years ago

Yeah, that's a smart suggestion. Parsing phone numbers to get the correct name is one of the harder things in this project, just due to how all the phone numbers are formatted in the sqlite database, and I'll definitely admit it's not perfect right now. I'll look into improving the code the retrieves the name, but I'll try to add something to display the phone numbers so there's no confusion.

blanxd commented 4 years ago

Uhm, wouldn't something straight, like SELECT p.First, p.Last FROM ABPerson p LEFT JOIN ABMultiValue m ON p.ROWID = m.record_id WHERE REPLACE(m.value,' ','') = '<the number with spaces removed>' do the job? Yeah I maybe suggesting something stupid, 1st idea digging in the db for a few minutes, I have no idea if the db structure could be different for iOS's nor have I done any further research, but running this with all sorts of differently formatted chat_identifier values from my messages, I'm not getting wrong answers (if the number/string exists in the AddressBook). For iMessages I suppose it's all different.

itsjunetime commented 4 years ago

That could actually cause an issue: in ABMultiValue.value, characters like parentheses and dashes are allowed, so you'd have to filter those out as well (since chat_identifier only contains the phone number plus a + at the beginning; no parentheses or hyphens). In ABPersonFullTextSearch_content, however, c16Phone contains many different variations on their phone number, so that the chat_identifier will always appear somewhere in there.

For example, if the chat_identifier was +11231231234, c16Phone would contain something like: +1(123)123-1234 +11231231234 0111231231234 0011231231234 11231231234 1231231234 1231234 while value might contain something like +1(123)123-1234 which wouldn't match.

I updated the query to be simply: SELECT c0First, c1Last from ABPersonFullTextSearch_content WHERE c16Phone like "% <chat_identifier> %" LIMIT 1, since the chat_identifier always appears (at least, as far as I can tell) as the second space-separated value in c16Phone. Would you be willing to see if this would work in your case (of course, keeping in mind that chat_identifier never contains any characters that are not digits or the plus sign)? This explanation may not make perfect sense, so let me know if not.

blanxd commented 4 years ago

Oh allright, shows my lack of research :) Now that I look at some numbers from around the world, yeah you're absolutely right, and I'm not sure how optimized the REPLACE function in SQLite is (ie. to nest a bunch of them in each other).

It looks like yes, it's the second space separated value in c16Phone, but, still not 100%. Because the 1st entry in c16Phone might contain spaces on its own, so if the chat_identifier is a short service number, it could well match someone else. You're right it's a mess :) (This is actually the reason I started this thread in the 1st place, for me it looks like a friend sent me a bunch of official announcements, because his number contains a 3-digit sequence which is a service number from my carrier)

Hmm, actually, what other characters could there be? I mean, - () would be 4 REPLACEs, REPLACE(REPLACE(REPLACE(REPLACE(ABMultiValue.value,' ',''),'-',''),'(',''),')','') might not even be that bad... owh perhaps the dot character as well?

itsjunetime commented 4 years ago

It's all good, I appreciate suggestions to solve something like this. But as far as I can tell in my own database, the first value in c16Phone never contains spaces; I think c16Phone was made specifically to be a space-separated field of values, so I don't think chat_identifier being a short number would cause an issue. Have you seen instances of the first value in c16Phone containing a space in your own database?

But yeah, It's an annoying issue and I've gone through a lot of iterations to parse this.

That replace function could also pose issues, however, if the email address that is in that slot contains a dash. You could check beforehand if it's a phone number, and not do any of the REPLACE()'s, but I still feel like it's somewhat clunky and there must be a cleaner solution... in upcoming updates, though, until we find a nice solution to this, I'll just show the phone number or email along with the name like you suggested

blanxd commented 4 years ago

yes I have plenty of those in c16Phone with spaces in the 1st value, like +123 456 7890 +1234567890 001234567890 4567890 etc... ie. the 2nd one is always the good one for sure, but the 1st one and the others could be anything literally anything, ie. if there are spaces in the 3rd one, the SELECT might also catch it. And, what's even worse, I now discover, it might not be the same number at all, I have plenty of records where there are actually different numbers in the c16Phone (ok these might belong to the same contact I guess, I hope).

londondev77 commented 4 years ago

I'm getting some interesting matches too.

Messages with an alphanumeric sender e.g. O2, Ocado, -EE-, 'EE' all get matched to my Opticians Aug 17 11:18:06 iPad SMServer(com.ianwelker.smserver.debugging)[10328] <Debug>: SMServer_app: parsing num and Optional("200") Aug 17 11:18:06 iPad SMServer(com.ianwelker.smserver.debugging)[10328] <Debug>: SMServer_app: selecting person: o2(uk), num: 200 Aug 17 11:18:06 iPad SMServer(com.ianwelker.smserver.debugging)[10328] <Debug>: SMServer_app: getting messages for o2(uk)

So it looks like it's changing the alphanumeric name to 200 but my Opticians doesn't have 200 in its number.

Also, I received a text from 150 and this matched one of my contacts who has 150 in her number.

blanxd commented 4 years ago

@londondev77 right now how it works is it puts a name to any unmatched number, and that name is probably the 1st one in your AddressBook databases's search table, has nothing to do with the given number as such. (I have several ones in the list with the same name, I knew I had not received any messages from that name ever, hadn't even used the number for years, checked the database and yeah, it's simply the 1st one there, an old name/number I have in my address book). I checked from the browser console, they all were non-numeric chat_id strings, things like "dpd", "viber" etc. which I simply don't have in my AddressBook, it's just messages from some services, so the system chose to show a random name (well it was the 1st in the db as I later discovered).

itsjunetime commented 4 years ago

Sorry, I'd just like to clarify a few things: In my sms.db file, all my chat_identifiers (SELECT chat_identifier from chat) are either phone numbers, group chat identifiers (e.g. chat123456789012345678), or email addresses. If you guys execute that same query to get all your chat_identifiers, do yours show some that are not group chat identifiers, and not emails, but have some combination of letters and numbers? And if so, what column do those chat_identifiers show up in in the ABPersonFullTextSearch_content table (e.g. c16Phone? c17Email?)?

Also, I was playing around with @blanxd 's nested REPLACE() solution a bit, and realized that sometimes, ABMultiValue.value doesn't contain the country code or area code, while the chat_identifier always contains the country code and area code (for those who have them), so I don't think we could use that column for identification, even with some text manipulation.

Here's my latest proposal, just written in pseudo code:

if chat_identifier.contains("@") { /// is an email
    name = SELECT c0First, c1Last from ABPersonFullTextSearch_content WHERE c17Email LIKE "%<chat_identifier>%";
} else if chat_identifier.contains("+") { /// has a plus sign somewhere in the string
    name = SELECT c0First, c1Last from ABPersonFullTextSearch_content WHERE c16Phone LIKE "% <chat_identifier> %";
} else {
    name = SELECT c0First, c1Last from ABPersonFullTextSearch_content WHERE c16Phone LIKE "% <chat_identifier>" AND c16Phone NOT LIKE "%+%";
}

My reasoning: As far as I can tell, chat_identifiers that contain plus signs always have a plus sign somewhere in their corresponding c16Phone field, and chat_identifiers that don't contain plus signs never have a plus sign in their corresponding c16Phone field. Also, as far as I can tell, chat_identifiers that have no phone number always have the full chat_identifier as the last space-separated value in their c16Phone field, so that's why I specified that exact spacing and wildcard placement in the else above.

Let me know if you think this would work, or if it works for your database.

londondev77 commented 4 years ago

If I look in the Chrome console, it shows the correct name when it's not a phone number (albeit with the incorrect capitalisation): getting 200 for chat screwfix getting 200 for chat from_ee getting 200 for chat hmpassport getting 200 for chat 150 (this was the one that got matched for one of my contacts whose number contained 150

Hopefully, that should help you work out the correct logic

blanxd commented 4 years ago

@iandwelker there are also non-numeric chat_identifiers in addition to numbers / chat0123... / email addresses in the chat table. Carriers can send messages in with some name instead of number, these are not msisdn per se, it's called Sender ID. Some carriers might even allow you to reply to such messages I suppose, these could be like "dhl", "viber", "waze", "paypal", "skype", "google" - just a few of the internatinal ones I have. These strings can also contain (in addition to latin letters) dot, space, minus (. -), this is what I can see in my db, I seriously hope they cannot or wouldn't ever put @-signs there, but I think it depends solely on what the carrier supports (consult a search engine about SMS Sender ID :). In the ABPersonFullTextSearch_content, they don't show up anywhere unless you have made a record putting that string into some name or organization field.

I think you might almost have a winner there by testing for the + and excluding results by that. As for the alphanumeric ones, it should return an empty set so you can just simply display it as is. Hmm, I see one alphanumeric record in c16Phone but I haven't received any messages from "1-800-MY-APPLE" while being in the US :) Not sure if they use this as a Sender ID for sending messages, does anyone know? Manually I'm unable to make a record in my AddressBook with letters in the phone number field.

For the query excluding the +, I think you should still put a space in the end there after the (without the ending %, yes this looks like very good logic), because my db seems to have all the c16Phone values ending in a space (unless NULL, of course). It still isn't a 100% though, because the c16Phone may contain values without a + which still have spaces in them. I might be an edge case (but a good test for this project) because I have numbers in my AddressBook from many different countries. Where I live I get messages from very short service numbers, but eg. I have a number in there from New Zealand which has no + sign in c16Phone, yet it's a long 0800 number which is space separated in itself so the last part is 3 digits (it doesn't match anything right now for me personally, but in theory it could).

itsjunetime commented 4 years ago

Thanks for pointing out the spaces at the end of all c16Phone fields -- I hadn't caught that. With all this in mind, I think I'll implement the plus sign-checking solution (with your space-at-the-end fix) into the next version (along with showing the chat_identifier value alongside the name) and ship it out soon. You can try it out, see if you notice any issues, and if there's anything glaring, we can revisit this. If there's anything else about this that you think needs fixing before I release it, though, let me know soon.

londondev77 commented 4 years ago

0.33 fixes all the issues I specified above. The only incredibly minor problem is that the senders with a name rather than phone number are all being displayed in lowecase. The Messages app shows the name with the correct capitalisation.

blanxd commented 4 years ago

@londondev77 Yeah, in order to display them like the Messages app does, it would need another db query or perhaps a joint query to get them either uppercase or capitalized or anything like that, the "correct" values are in another table there. Like you say, a minor thing really, I don't consider this a deal breaker :)

BTW, if you click on any of these, do you see the messages on the right pane? I don't, and I think I also know where the problem is (in the source code), but this should be another thread I think, unless I'm the only one encountering this...

itsjunetime commented 4 years ago

So just another clarification: Do the c0First and c1Last fields in your databases display the names completely lowercase? Mine show them with correct capitalization, but I can switch the code to get the corresponding names from ABPerson if those show the correct capitalization for everyone else.

And yes, the messages are supposed to appear in the right pane. I'd prefer that you open another issue for that, though, so that we can close this once we figure out the capitalization issue, and we don't have too many issues getting confused with each other in one thread.

blanxd commented 4 years ago

@iandwelker it's the chat_identifiers that aren't numeric, the SMS Sender ID thingies, which have nothing to do with the AddressBook. If you look at the handle table in sms.db, the handle.id field = the chat.chat_identifier and the handle.uncanonicalized_id is what the Messages app displays, it's mostly a case thing in my db, like twitter > Twitter, paypal > PayPal, dpd > DPD, etc etc..

itsjunetime commented 4 years ago

Ahh, gotcha there. I'll fix that in the next update, then.