Random code for reading out?

takluyver commented 9 years ago

I've looked at zget a few times, and I think it's a really neat idea. But it's always struck me as a bit awkward to read out filenames:

Hey Tom, I am zgetting you my underscore holiday underscore pictures dot zip!

It's also not obvious whether it's case sensitive, and if you have a filename with spaces in, the recipient has to mess around with quoting/escaping it, with no help from tab completion.

How about producing a short random token instead? E.g.

$ zput my_holiday_pictures.zip
my_holiday_pictures.zip is now available on the network
Ask your friend to 'zget 3FL6'

Or, with the receiver initiating:

$ zget
Ready to receive a file
Ask your friend to 'zput <filename> QNT8'

Using one of the base 32 alphabets, you can avoid similar looking characters like O/0 or I/1. With 4 characters, that gives you ~1 million possible codes.

To preserve the security model, I imagine that half the token would be advertised on the network, and the other half would be the authentication token that the second party (whichever didn't advertise) needs to send.

I think this would also reduce the risk of collisions - although the space of possible filenames is large, people don't sample that space at random, and in a busy office it's possible that two people could be zput-ing the same filename at the same time. With random codes, the risk of collisions is a function of the number of transfers being advertised. Advertising 2 base 32 digits as I propose means collisions are expected when 38 files are being advertised at once (if I've understood the birthday paradox correctly). If that's insufficient in large networks, the number of digits advertised could easily be increased by a config setting.

Thanks for creating zget :-)

nils-werner commented 9 years ago

I agree that using filenames without tab completion can be really awkward in some situations and thought of using tokens, too. I went with only filenames for now because

it adheres to the UNIX principle nicely: there is no unneccessary output when there was no error and it can be scripted nicely (parsing the Ask your friend line might be annoying)
It makes it clear what you're actually requesting and if you may be overwriting local files
Has a simple, cp/wget-like command line interface
When given the choice users might always pick tokens like aaaa, potentially compromising security
it was simple to implement :-)

However I would definitely be willing to implement a token system if it played nicely with these principes and were an optional additional feature.

My first idea was to allow a second zput argument: zput my_pics.zip AFGD which would then be used with zget AFGD or zget AFGD foo.zip. That would have a nice CLI API (both commands work like cp) but it could be a bit nasty because

The recipient may receive any filename, unexpectedly overwriting local files (even when requesting tokens you want to have the original filename)
The sender may be lazy and always pick AAAA

Just to give you a broad overview over how the Zeroconf peer finding works: There is no initiator. Both parties initiate their part of the procedure, no matter if the other already did theirs. After both did the Zeroconf procedure, the recipient will at some point pick up the broadcast from the sender.

From this broadcast the recipient will extract the peers IP address and port and simply do a http://<ip>:<port>/<filename> request.

This means that when you give either party the possibility to create a transfer token, you will potentially have a filename and two tokens to deal with (no problem, just sth to be aware of).

I agree that four characters (32 Base, case insensitive) should be sufficiently safe and seem a pretty good fit for our usecase. Generally though, I would prefer to advocate the SHA1 of filenames and tokens on the network. When a match was found the recipient may then have the choice to request either a token (like http://<ip>:<port>/<token>) or a filename (like http://<ip>:<port>/<filename>) from the sender and then be handed the file.

takluyver commented 9 years ago

Thanks, that makes sense. A couple of refinements to the idea:

For scripting, add an option to produce some kind of machine readable output. Git has --porcelain, for instance, and Jupyter has --json for some commands.
Alternatively, you could produce machine readable output if stdout is a pipe rather than a tty. This is a de facto Unix tradition (even ls does it) , though I prefer having an explicit flag for machine readable output.
Never automatically overwrite local files. I would do it like browser downloads - if a file with that name already exists, add a suffix with a number, incrementing the number until it's unique - e.g. my_holiday_pics_1.zip
I totally agree that people will be lazy if they pick the token, and it's extra cognitive load to do so. So I'd stick to having the token autogenerated. It's not exactly like any other command line tool I can think of, but I think it's a simple enough model that people shouldn't have any trouble understanding it.

My mistake on the implementation - looking at the examples, I assumed that whichever one you started first was creating an 'advertisement' of some kind which the second one responded to. On looking at the code some more, I realise that it's always the sender that advertises and the recipient that responds (by making an HTTP request). I like this design better than what I thought was going on :-).

I'm not convinced about using sha1 hashes as a security measure. A malicious actor on the network could easily generate a mapping of sha1 hashes to possible tokens or common filenames. When they see a broadcast of a sha1 hash, they could then quickly look up the corresponding token/filename and make a request for that. Security probably isn't a major design goal, as it's for trusted networks, but if there's a shared secret in the architecture, I'd rather it had no relationship to the advertised data at all.

Would you be happy for me to have a go at implementing tokens, and make a pull request for it?

nils-werner commented 9 years ago

I've started working on a first draft that does

zput file.jpeg asd will allow you to download using zget file.jpeg or zget asd. In case of zget asd it will still create a file file.jpeg (or file_n.jpeg if file.jpeg or file_n-1.jpeg already exist).
zget file.jpeg will not overwrite an existing file.jpeg, however zget file.jpeg file.jpeg will.

What do

takluyver commented 9 years ago

I still think it's much more useful for the system to generate the alias, rather than requiring it from the user. As you said, users will be lazy, especially as it looks like there's no lower limit on length, so you could just zput file.jpeg a.
I think the principle is good, though I'd do the interface with either an explicit option zget file.jpeg -o file.jpeg, or by writing to stdout zget file.jpeg > file.jpeg.

I'll have a go at implementing my idea, so we can have a play with it.

nils-werner commented 9 years ago

In case of a simple/short token I thought about showing a warning "insecure upload token, your transfer may be highjacked" but still allowing them. When the user didn't set one we can provide one for them.

I was playing with the idea to allow zget file.txt - to print the contents to stdout. I guess the -o option is more curl-style syntax, the second filename more cp style

nils-werner commented 8 years ago

I have implemented a simple version of this in the latest release, you can upgrade it using pip install -U zget.

We will think about security in a later release and a separate, issue.

takluyver commented 8 years ago

I just saw a talk about a similar tool called magic wormhole - it creates a code made of randomly selected words from a list, and then it uses a neat algorithm called PAKE to turn that weak code into a strong encryption key which it can use to transfer data.

nils-werner commented 8 years ago

Very interesting, and not infeasible to implement here.

nils-werner / zget

Random code for reading out? #5