Open abika opened 9 years ago
This is related to #496.
To support the discussion of phone number hashing, I asked a friend to sha1 all valid Dutch phone numbers to get a rainbow table running the following PHP script with time hhvm
:
<?php
$current = "+31600000000";
do {
echo sha1($current) . "\n";
$current++;
} while ($current != "+31700000000");
Result: real 20m59.193s Without echo: real 0m31.833s
Yes, that is 20 minutes to get a rainbow table for every Dutch mobile phone number.
No matter how much I love to accept some crypto over no crypto, I don't think this is worth the trouble it causes for federation.
At the time (a few years ago), hashing the number seemed like the right thing to do to hide as much as possible from direct exposure. I knew that it would become a problem one day, especially with CPU power increasing day after day. But consider that this is a BIG change. Very big.
First of all, I'll have legal issues because of privacy: because I provide a service treating phone numbers, I'll have to constitute a privacy notice, with all the legal stuff that is required. It's really time consuming and also expensive (I guess I'll need to contact a lawyer for that, and I'd have to do that as a person, not as a nonprofit organization - which doesn't exist yet). Using hashed numbers saved me from a number of legal issues - I know it's a way of tricking the system, but whatever... But let's consider only the technical side.
There are two ways of approaching this: using a random string and a phone number/JID mapping table or using phone numbers directly in the JID. Both requires a substantial change in a lot of things, specifically changes required by changing the JID:
<hash>@<server>
; we'll need to automatically change that after upgrade - this probably means key regeneration which means change in fingerprint which means users accepting each other keys again (although this might not be a big deal)This is simply a matter of the email field in the key, which will change to reflect the phone number. Nothing changes for authentication itself, but an automatic procedure has to be implemented on both client and server to support automatic key regeneration with a different UID (conversion from hashed to clear phone number).
Roster is the most complicated one and it's strictly tied to message delivery as in permissions between users. An alias mechanism has to be implemented on the server to make rosters retrieve items from it in both hashed and non-hashed forms. Same problem with message delivery, where older clients will deliver to hashed JIDs for some time.
I could implement a batch program to change all JIDs in all rosters from all users by creating a rainbow table and reversing all the hashes, but some adapter for JIDs in hashed form has to be done in Tigase.
Issues with user lookup are the same as with rosters: older clients will send hashed numbers. We might inhibit older clients from doing that though (forcing them to upgrade).
I have thought about this a few times and I also think the current hash is not very useful, however I don't think anything should be changed immediately, since a change would be very disruptive. Long-term wise it would be good to think of something new, though!
The phone number lookup on client side should be kept as it it, but the server stores a separate (phone number)<->JID table for the comparison with registered user. This also is technical logical, as the lookup has nothing directly to do with XMPP.
I don't think this is good. We should not have to rely on the server. This mechanism would mean, that the server could actually be asked by the $AUTHORITY to watch for traffic from certain telephone-numbers or otherwise manipulate communication. I would instead propose:
This would make the input space quite big for brute-forcing. True, if you know both a Person's name and a person's number, you can still compute the hash, but this is no worse than now and the server admins can still always claim they have no record of the actual numbers or names (this is important!)
This still leaves to problems:
What do you think?
The phone number lookup on client side should be kept as it it, but the server stores a separate (phone number)<->JID table for the comparison with registered user. This also is technical logical, as the lookup has nothing directly to do with XMPP.
Also one more thing I want to add is that this is essentially uploading your entire address book to your server. (So is the current solution) because sequential look ups of 'do you have this number and do you have this number' is exactly the same thing. I'm not necessarily saying this is a bad thing. But other services usually get a lot of bad press for doing this.
Separating phone numbers from Jids would allow to make this lookup optional.
I stated in my original post that I don't have a problem with services using phonenumber@domain.tld style jids and uploading your phone book to a central server. But these services should be as honest as possible about this and not hide behind 'pseudo security'
@daniele-athome
There are two ways of approaching this: using a random string and a phone number/JID mapping table or using phone numbers directly in the JID.
Please no phone number in any way in the JID. I don't wanna have to expose my phone number to anyone I communicate with Kontalk/XMPP. That is the problem right now.
@h-2
however I don't think anything should be changed immediately, since a change would be very disruptive.
Jep, a good start would be to use a new JID-style for new created accounts.
This mechanism would mean, that the server could actually be asked by the $AUTHORITY to watch for traffic from certain telephone-numbers or otherwise manipulate communication.
First, Kontalk has build-in encryption/signing: the message content is safe. And the matter of storing telephone numbers on server is arguable a weighting between usability vs. privacy. I think Kontalk/Daniele has made the right decision here. A messenger that lacks essential features, so its not used by anyone is meaningless. And you can (in theory) choose the server yourself or set up your own.
hash over "tolower(first name) + plus number" on registration
I thought about adding a contacts name in some way as well, but its not really a good solution for the very reason you mentioned yourself: When I look at my phones contact list this won't work for about 1/3 of my contacts. Mostly nicknames, no first name or writing issues: one acute missing -> different hash
@iNPUTmice
But these services should be as honest as possible about this and not hide behind 'pseudo security'
I totally agree
@abika (+ @h-2 about using name+number)
Please no phone number in any way in the JID. I don't wanna have to expose my phone number to anyone I communicate with Kontalk/XMPP. That is the problem right now.
There is currently no difference between using cleartext phone numbers or their hashes: in a matter of months, a SHA-1 reverse hash will take minutes to compute and someone sooner or later will (with Kontalk popularity rising) develop a nice little tool for discovering phone number given their hash. Then having a hash or having a phone number would be the same thing. Seriously. In the past years I've never thought of technology coming this far as making hashing useless for hiding short source strings. Even the legal issues I was talking about before would be exposed because of this.
The idea of using more data for the hash is too unstable IMHO and would complicate things even more.
@iNPUTmice
Separating phone numbers from Jids would allow to make this lookup optional.
That's the problem. Strictly speaking, Kontalk users are phone numbers.
But other services usually get a lot of bad press for doing this.
I know... the only solution to this would be to remove automatic lookup, and replace it with:
Ok, so we agree on not using the phone number in the JID anymore?
I'm sorry, are you talking about not using phone numbers at all - not even hashes - or not exposing phone numbers in JIDs?
in a matter of months, a SHA-1 reverse hash will take minutes to compute and someone sooner or later will (with Kontalk popularity rising) develop a nice little tool for discovering phone number given their hash.
I can write you a script right now that can look up any number instantly if you are interested.
What format are the numbers in +country code
or 00countrycode
?
@iNPUTmice no need thanks, I know you can (I could too FWIW, that's not the problem). What I was saying is that at the time (several years ago) I was more convinced that hashes were safe enough to hide the source information. In fact they're not now.
In my opinion we should first go on with developing features that the most users want (group chat,send videos and so on). Because, if we not put new features in Kontalk, Kontalk will never reach the critical mass of users that press will mention it. And if we never reach this point, we can stop developing Kontalk. But if we reach masses (not only techs and paranoids ;-)) hopefully there will be more developers who can help. Also fix the problem with hashes.
Kontalk is not insecure. ALL messages are strong encrypted. OK, hashes not really secure. But there is only one developer doing the whole work. And I think his work is really good. Isn't it?
Of course, if there anybody out there who can help developing he/she should send his/here pull requests ;-)
In my opinion we should delay this issue to milestone 3.2 or 4
Sorry for my bad english. I hope you will understand what I mean :-/
@webratte I agree completely.
I'm not that much into detail, but what about hashing telephone number and public key of a user? I don't know how easy it is to obtain the public fingerprint... Or what about phonenumber + private key as hash? please illuminate me if I'm totally thinking crap ;) I'm - like I already said- not that expert.
@115ek the problem is automatic lookup: you can't do it if you store a combination information that can't be built again using just the phone number itself. If we remove automatic lookup we can do pretty much anything we want.
Ok, so I think we have agreed, that every user should get a regular username, it is just the association of fon-number -> username that it not yet decided, right?
I still think it would be benificial to not use save cleartext fon-numbers on the server. What about the following:
lookup(concat(bobsnumber, alicesnumber)) == bobsusername
concact(friend_i, alicenumber)
and asking the server for these, retrieving the username bob left for her under this hashThe benefit of this is that the number-to-be-hashed is twice as long as before. Do you think this would be enough to make attacks unfeasible? IIAMN this should increase the time needed for bruteforce by a huge factor. You could also add a salt that changes, e.g. every six months, so that the client would have to try all salts, further increasing the complexity and making a pre-computation of all values infeasible.
@h-2 interesting approach. That would of course only work if both parties have each other in their address book. But actually that might be something that is desirable anyway.
BTW: the reason I'm participating in the conversation even though I'm not using kontalk myself is because I want to eventually create a centralized (voluntary) phonenumber to XMPP ID lookup service that works for all XMPP ids and not just for kontalk users. (My own users are interested in this as well.)
One more thing: When looking up hashes to find an XMPP id the user doesn't have to upload the entire hash. Instead the user could look up only the first few bytes of a hash and have the server respond with a certain number of hash/XMPP-id tuples (indicating if there would be more hashes). If the users finds the hash in the response it just obtained the XMPP id without telling the server what exact XMPP id the user was after. If not the user can increase the number of bytes, byte by btye until the desired hash is part of the response.
@h-2 interesting approach. That would of course only work if both parties have each other in their address book. But actually that might be something that is desirable anyway.
Yes, but I thought about this as a feature, not a bug, as well.
One more thing: When looking up hashes to find an XMPP id the user doesn't have to upload the entire hash. Instead the user could look up only the first few bytes of a hash and have the server respond with a certain number of hash/XMPP-id tuples (indicating if there would be more hashes).
Sounds like a good idea.
I want to eventually create a centralized (voluntary) phonenumber to XMPP ID lookup service that works for all XMPP ids and not just for kontalk users.
While a client-agnostic service would be great there are a few things that make this more difficult:
BTW: any and all cooperation and increased compatibility between conversations and kontalk would be really great! Seems like two similar minded projects with a truck-factor of 1 (maybe 1.3 for conversations) could benefit a lot from working together ;)
@h-2 you are making some valid arguments however they apply to centralized approaches a well. I will explain why but let me first make an additional statement to something I mentioned earlier.
look up only the first few bytes of a hash and have the server respond with a certain number of hash/XMPP-id tuples [...] If not the user can increase the number of bytes, byte by btye until the desired hash is part of the response.
This will essentially make your JID public which is a good thing because a) if you choose to use the service all the implications are clear from the get go. You are deliberately choosing to make a certain information public instead of trusting a service. This means even if the database of the service got hacked nobody will obtain information that were not meant to be public b) other services can backup the entire dataset
or do you allow multiple accounts to be connected to the same number? even on different servers?this would make it easier to attack (people trying to add an additional account to my number).
this is a completely valid point and a big problem. However. If we do the tuple hashing that means (if our assumptions are correct) that the server doesn't know the original number of the and thus can not do SMS validation. (The server can in fact not even figure out if there are two accounts for the same phone number - the server could disallow for a hash to have multiple entries but without knowing what the 'true' hash would be)
However I have thought about this problem and I have (parts of) a solution. The server will allow multiple jids to be stored for the same hash. Now if a users starts looking up phone numbers the user might get to a point where the user will receive two jids for the same hash without knowing what the right one is. One way to work around this to start looking up the same phone number from the perspective of other phone numbers in his address book. (instead of hash(my_number,number_of_contact_in_question) the user looks up hash(other_number_in_address_book,number_of_contact_in_question) under the assumption that one of his friends is friends with the same person) (or even better multiple)
this way if a malicious attacker tries to 'poison' the lookup the attacker has to know phone numbers of other friends of the victim as well.
- expiration of number-id-association: if you only do it for your own service, you can expire an account after x months without login; if you maintain a service for other servers' accounts, you don't know the last time of login
- how do you forcefully break an association, e.g. if my number changes and I don't have access to the old one?
this can be achieved by invalidating hashes after a certain amount of time. Since a regular user is probably doing syncs (lookups) in regular intervals as well it should be no problem to refresh his own hashes on that occasion as well.
BTW: any and all cooperation and increased compatibility between conversations and kontalk would be really great! Seems like two similar minded projects with a truck-factor of 1 (maybe 1.3 for conversations) could benefit a lot from working together ;)
Conversations is in XMPP client. I'm more than happy to implement any (sane) XEP the kontalk team publishes. But I'm not going to implement proprietary protocols from other clients just to be compatible with them. There are standards. And standards exists for a reason.
Joining this discussion because, though not much of a developer myself, I'm interested in the problem discussed here and have put some thought into the issue.
Using two numbers for hashing is an interesting idea, but there are issues:
Moxie Marlinspike once write a blog post about these matters, see here: https://whispersystems.org/blog/contact-discovery/ .
Bottom line from my point of view: There is no known method to use phone numbers for contact discovery that could keep them private. So this should be optional, and it should be made clear to the user that he is effectively publishing his mobile phone number if he wants to be found with it. It should be hashed as daniele said in order not to make it directly visible to users, but that's it.
As the number of possibilities is much higher, the problem is not as severe for email addresses, so I'd suggest to additionally use them for matching, as Threema does. Users could decide which data they want to be found with.
My ideal vision of such a directory would not be a central server with all the data, but some distributed service that could be installed as a component on XMPP servers. If the data could be split among directories, this would be a considerable privacy gain: Every server would only host data about its users, no server would have the whole dataset. But distributed search doesn't scale, immediate answers couldn't be expected (but with daily sync, it wouldn't be that bad if answers take a few hours if users know about that). But there would have to be some intelligent search routing that ensures that not every server will receive all worldwide search requests (far too much for a small server). I guess this could be possible somehow, but it would be very far from easy and a whole software project of its own.
And if combinations with contact numbers are not possible, there is still the problem of how duplicate, old, and/or intentionally wrong entries can be handled. Maybe newer entries could simply be served first in a list, and entries deleted if not republished for some time. The problem that you never know if you are chatting to the right person is present in WhatsApp and all others, too, after all, if you have and old phone number in your address book. This must be solved using offline ways like QR code scanning. The question is if it is acceptable to users that they may get wrong answers even if they do have the right phone number (and know that) because somebody may be impersonating their friends.
Of course, publishing one's JID in such a directory means giving it to the public and opening oneself to spam. Maybe a separate service (component) could mitigate that a bit, i.e. not real JIDs are published, but random numbers only the user's XMPP server can translate back to a real JID. If someone wants to connect to the JID, his client would send some message to that random JID, and the server would tell the real JID that some other JID tried to contact the random number, and the user can then decide if he wants to put that one on his roster. Too complex to be easily understandable by everyone, maybe? Not sure. If there is a message payload, it could again be used for spam, if there isn't, the contacted user might not know who is trying to contact him.
Very complicated stuff, all in all. :-/
To make federation easier, Kontalk could implement this protocol: https://xmpp.org/extensions/xep-0100.html#addressing-iqgateway even though not technically a "gateway" as it is XMPP-native, it would work. Ask for the phone number, and return a JID (after doing hashing, server selection, or whatever else may be needed now or in the future to get the right JID from only the phone number) -- this provides a nice user experience without requiring a particular technical implementation for the JIDs.
That is actually a good idea, thanks @singpolyma. And it wouldn't need any implementation for those clients that already implement XEP-0100. Also, this way we won't need to expose any algorithm or protocol for JID derivation: it's all done by the server and what is returned is a simple JID. I'm keen to use this suggestion and go ahead implementing it on Tigase, @abika any comments/thoughts?
Yes, it's definitely a good idea to move the {phoneNumber->JID} algorithmen exclusively to the server side. It will be needed once the JID cannot be directly derived from the phone number anymore, anyway. And with XEP-100 the gateway can also be used by non-Kontalk clients.
But I guess it cannot be used for the address book sync (?)
But I guess it cannot be used for the address book sync (?)
No, mainly because it needs a mass request and XEP-100 allows for one JID to be returned if I understand correctly.
I wrote a little program to 'unhash' (lookup) kontalk sha1 hashes to phone numbers: https://www.moparisthebest.com/phonehash/
You can link to specific hashes like so: https://www.moparisthebest.com/phonehash/#80808080ccdd107488bad45a74b3c5755c4bd108
It's all open source if anyone is interested: https://github.com/moparisthebest/phonehash
Just a fun side project to see how feasible something like that would be, and after about 89 hours of runtime, generating and sorting the file on slow hard drives, now any number can be looked up in around 2 seconds.
To avoid this I would suggest some type of per-user salt, which of course kind of defeats the purpose of the server not being involved in contact lookup, I liked some of the suggestions earlier but haven't had time to fully evaluate them.
Nice exercise! The problem is being able to match numbers from the server but not from clients. This could imply some sort of secret (e.g. the salt might be one example) known only to servers. But the Kontalk network is designed to be distributed (as in federated), so the secret should be shared among servers... It's complicated.
Yes I fully understand that, you are caught between a rock and a hard place:
I don't think there is a perfect solution to this.
By the way if someone (must be a student) wants to create a generic phone number to JID lookup service (maybe roughly like I described it before) the conversations.im is a mentoring organization in this years Google Summer of Code and would like to mentor a project like this.
It has been a while but here is my attempt at the phone number sync thing, which I believe to be a little bit more open and convenient. I have now introduced Quicksy which is my attempt at providing phone number lookup for in the Jabber world. Here are are the slides for me introducing this new spin of. And here are recordings of the German talk I gave.
Hi @iNPUTmice
Maybe I'm wrong and I misunderstood your occasion for this post.
But I think it's not really fair to promote Quicksy in this place.
As you know the idea behind Kontalk is a phone number based XMPP client.
In your slides you mentioned Quicksy don't cannibalizing Conversations. That's correctly. But for me it looks like it cannibalized Kontalk.
Wouldn't it be better if you would help each other (e.g. help to implement OMEMO in Kontalk to bring it closer to XMPP standards) instead to "fight" against?
XMPP need different but compatible Clients as much as possible to become really famous again.
I believe the UI of Kontalk is closer to WhatsApp. And so it's easier to bring WhatsApp users to XMPP.
I'm only a normal user who refuse WhatsApp. And I know it's not always easy to bring WhatsApp users to a new messenger. So it's IMO easier if there is a familiar UI.
Please forgive me if I misunderstood your post.
@webratte If you believe that that Kontalk is better and/or fits your needs better that’s fine and you should have nothing to worry about.
This issue was about Kontalk not hashing phone numbers - I just thought it would be interesting that for my approach I chose not to, to make it easier for regular XMPP users to add someone who is on Quicksy. Jabber users can just type +1555443522@quicksy.im
to add someone.
Also Jabber users can add their Jabber ID into the Quicksy directory to allow Quicksy user to auto discover their JID. Thus regular Jabber users are not forced to use Quicksy while still allowing their friends to discover them easily when they use Quicksy.
If you don’t like to see it as advertisement see it as a feature request for Kontalk.
If you don’t like to see it as advertisement see it as a feature request for Kontalk.
In this case you should open a new issue and mark it as feature request.
You are a dev. You know how it works ;-)
XMPP need different but compatible Clients as much as possible to become really famous again.
I agree completely. Unfortunately Kontalk is the app that makes it hard for users to federate with other XMPP clients out there. I tried advertising Kontalk in the past exactly for the phone number convenience feature but using Kontalk in combination with non-Kontalk-XMPP-applications is really annoying and not convenient.
The issues are already there, for example: #496, #567 and this one of course. Nothing happening there as far as I see it.
So maybe this can be seen as an incentive for these issues.
But still it's a bad style to promote a App in a repo of a similar software ;-)
Isn't it?
@webratte I don't think the two apps are competitors. I look at them as complimentary apps and it seemed natural to me. Both are Free Software apps.
Following the discussion at siacs/Conversations/issues/1273 I have to admit that phone number hashing is a major problem: It's actually meaningless to obfuscate the phone number with a hash if that can be reversed with brute force in a couple of hours. Even more, it gives user a false sense of privacy/security.
Knowing the hash means knowing the phone number - and the hash is now send in plain text with every message. With "Snowden" and all, intelligence services really love this for sure.
I don't see any direct remedy for this (salt, key stretching, ...) and thus, to keep the current auto-user-discovery functional at it is, can only call for one solution: don't use the hash. Instead, e.g. generate a random string.
The phone number lookup on client side should be kept as it it, but the server stores a separate (phone number)<->JID table for the comparison with registered user. This also is technical logical, as the lookup has nothing directly to do with XMPP.
And as a general benefit, better federation with standard XMPP clients comes out of the box, as Kontalk JIDs are much shorter now. Manually typing them in is now a valid option.
I know this is a big change again, but a necessary one.