nils-werner / zget

Filename based peer to peer file transfer
https://pypi.python.org/pypi/zget
MIT License
485 stars 23 forks source link

Random code for reading out? #5

Closed takluyver closed 8 years ago

takluyver commented 9 years ago

I've looked at zget a few times, and I think it's a really neat idea. But it's always struck me as a bit awkward to read out filenames:

Hey Tom, I am zgetting you my underscore holiday underscore pictures dot zip!

It's also not obvious whether it's case sensitive, and if you have a filename with spaces in, the recipient has to mess around with quoting/escaping it, with no help from tab completion.

How about producing a short random token instead? E.g.

$ zput my_holiday_pictures.zip
my_holiday_pictures.zip is now available on the network
Ask your friend to 'zget 3FL6'

Or, with the receiver initiating:

$ zget
Ready to receive a file
Ask your friend to 'zput <filename> QNT8'

Using one of the base 32 alphabets, you can avoid similar looking characters like O/0 or I/1. With 4 characters, that gives you ~1 million possible codes.

To preserve the security model, I imagine that half the token would be advertised on the network, and the other half would be the authentication token that the second party (whichever didn't advertise) needs to send.

I think this would also reduce the risk of collisions - although the space of possible filenames is large, people don't sample that space at random, and in a busy office it's possible that two people could be zput-ing the same filename at the same time. With random codes, the risk of collisions is a function of the number of transfers being advertised. Advertising 2 base 32 digits as I propose means collisions are expected when 38 files are being advertised at once (if I've understood the birthday paradox correctly). If that's insufficient in large networks, the number of digits advertised could easily be increased by a config setting.

Thanks for creating zget :-)

nils-werner commented 9 years ago

I agree that using filenames without tab completion can be really awkward in some situations and thought of using tokens, too. I went with only filenames for now because

However I would definitely be willing to implement a token system if it played nicely with these principes and were an optional additional feature.

My first idea was to allow a second zput argument: zput my_pics.zip AFGD which would then be used with zget AFGD or zget AFGD foo.zip. That would have a nice CLI API (both commands work like cp) but it could be a bit nasty because

Just to give you a broad overview over how the Zeroconf peer finding works: There is no initiator. Both parties initiate their part of the procedure, no matter if the other already did theirs. After both did the Zeroconf procedure, the recipient will at some point pick up the broadcast from the sender.

From this broadcast the recipient will extract the peers IP address and port and simply do a http://<ip>:<port>/<filename> request.

This means that when you give either party the possibility to create a transfer token, you will potentially have a filename and two tokens to deal with (no problem, just sth to be aware of).

I agree that four characters (32 Base, case insensitive) should be sufficiently safe and seem a pretty good fit for our usecase. Generally though, I would prefer to advocate the SHA1 of filenames and tokens on the network. When a match was found the recipient may then have the choice to request either a token (like http://<ip>:<port>/<token>) or a filename (like http://<ip>:<port>/<filename>) from the sender and then be handed the file.

takluyver commented 9 years ago

Thanks, that makes sense. A couple of refinements to the idea:

My mistake on the implementation - looking at the examples, I assumed that whichever one you started first was creating an 'advertisement' of some kind which the second one responded to. On looking at the code some more, I realise that it's always the sender that advertises and the recipient that responds (by making an HTTP request). I like this design better than what I thought was going on :-).

I'm not convinced about using sha1 hashes as a security measure. A malicious actor on the network could easily generate a mapping of sha1 hashes to possible tokens or common filenames. When they see a broadcast of a sha1 hash, they could then quickly look up the corresponding token/filename and make a request for that. Security probably isn't a major design goal, as it's for trusted networks, but if there's a shared secret in the architecture, I'd rather it had no relationship to the advertised data at all.

Would you be happy for me to have a go at implementing tokens, and make a pull request for it?

nils-werner commented 9 years ago

I've started working on a first draft that does

  1. zput file.jpeg asd will allow you to download using zget file.jpeg or zget asd. In case of zget asd it will still create a file file.jpeg (or file_n.jpeg if file.jpeg or file_n-1.jpeg already exist).
  2. zget file.jpeg will not overwrite an existing file.jpeg, however zget file.jpeg file.jpeg will.

What do

takluyver commented 9 years ago
  1. I still think it's much more useful for the system to generate the alias, rather than requiring it from the user. As you said, users will be lazy, especially as it looks like there's no lower limit on length, so you could just zput file.jpeg a.
  2. I think the principle is good, though I'd do the interface with either an explicit option zget file.jpeg -o file.jpeg, or by writing to stdout zget file.jpeg > file.jpeg.

I'll have a go at implementing my idea, so we can have a play with it.

nils-werner commented 9 years ago

In case of a simple/short token I thought about showing a warning "insecure upload token, your transfer may be highjacked" but still allowing them. When the user didn't set one we can provide one for them.

I was playing with the idea to allow zget file.txt - to print the contents to stdout. I guess the -o option is more curl-style syntax, the second filename more cp style

nils-werner commented 8 years ago

I have implemented a simple version of this in the latest release, you can upgrade it using pip install -U zget.

We will think about security in a later release and a separate, issue.

takluyver commented 8 years ago

I just saw a talk about a similar tool called magic wormhole - it creates a code made of randomly selected words from a list, and then it uses a neat algorithm called PAKE to turn that weak code into a strong encryption key which it can use to transfer data.

nils-werner commented 8 years ago

Very interesting, and not infeasible to implement here.