File transfers - Githubissues

Being able to transfer files would be a useful feature, to enable whistleblowers, etc.

Wow, this is an old issue. It's about time to get this done. Let's talk about file transfers. This is partly requesting feedback, partly trying to convince myself, and partly a braindump, but hopefully it's useful.

UX

I see two broad types of use cases for file transfers. The active users are sending and receiving files during conversations, and expect the file to be part of their conversation. File drops are for an important minority of Ricochet users who want to stay online and be able to receive files at any time, even if they're not immediately active.

Like most messaging applications, we want to do file transfers inline as part of the conversation, and they must not require using any other application (including a browser).

Drag and drop of a file into a conversation is the normal way to send a file. Arguably, it could be easy to accidentally send a file this way. That could be mitigated by dropping the file into the message input box and allowing typing a message to go with it before sending, or more easily with a countdown before sending. There should be another visible UI method for sending files, like a button in the conversation header or near the message input.

ricochet-file-offer ricochet-file-sending

Offered files must expire after enough time to reasonably assume the user might've forgotten.

Ricochet's threat model prohibits storing conversation data (including files) automatically. The threat model also sees contacts as potential exploitation adversaries, so we want to reduce the set of actions a contact can trigger without user approval. These suggest that we should require explicit accept by default for file transfers. The user has to explicitly choose where to save each file, unless they configure a default location. This is fine with the active user use case, but for file drops we will need an option to automatically save files (which could also be per-contact).

ricochet-file-offer-recipient ricochet-file-receiving

Active offers or transfers must be easily noticable to both parties. In addition to showing progress on the transfer in the conversation, there should be progress in the conversation header (a transfers tab?) and/or global transfer status somewhere. This is particularly important because the conversation may move on while transfers are ongoing. When a transfer finishes, it should be moved to the end of the conversation.

ricochet-file-conv-header

Completed transfers are opened by clicking, with a warning similar to the browser warning.

Tor has been known to be unreliable, so the ability to resume will be important. We must reconnect and resume automatically whenever possible. If too much time has passed, or one of the clients has lost its history, resuming should still be possible by saving over an existing incomplete file.

Future UX

Displaying images inline with conversations is a standard messaging feature. This is easy to do once we have file transfer in place, except that I don't trust image decoders. Once we find a robust and memory-safe decoder, or cross-platform seccomp-style sandboxing, this is worth doing.

Someone pointed out that we could prefetch files, at least by building connections and buffering up to some in-memory limit, to make the transfer process seem faster.

It would be nice to allow transferring batches of files or entire folders up to some reasonable limit.

Having a way to automatically verify the hash of the downloaded file with the sending server would be nifty.

Protocol

Connections

File data can't be sent over Ricochet's primary protocol connection. Even though we packetize data, because of the buffering properties for Tor streams, very large amounts of data will in queues when flooding a hidden service connection. This causes extreme (often 30s or more) latency for that stream. Any other form of rate control would have too much impact on transfer speeds.

The simple answer is to open additional connections to the peer's service. These connections will be multiplexed by Tor onto the same circuit, but buffering behavior is significantly better. In casual testing there seems to not be significant impact on message latency when flooding data to another stream on the same circuit.

We could also use circuit isolation options to force Tor to build new circuits for transfers. It's unclear whether this would be useful for throughput or latency, and it's unclear whether building additional concurrent circuits would have significant anonymity or traffic analysis impacts.

Ricochet's protocol doesn't allow more than one authenticated connection per contact, because it would be ambiguous which connection should be used and could violate expectations on message ordering. If additional connections use Ricochet's protocol, they will need to authenticate differently to indicate that the connection is only used for data transfer.

Because data transfer is happening over secondary connections, we're not required to package data with Ricochet's protocol. It's worth thinking through the options here.

Option 1: Modified Ricochet protocol

My original thinking was to use Ricochet's protocol for file data transfer on additional connections. This isn't entirely straightforward.

Benefits:

Could easily send small files inline over the protocol connection
More consistent with the existing protocol
No new parser/server attack surfaces

Downsides:

There can only be one primary connection, so authentication ends up being weird
Unless we made deeper protocol modifications, data has to be broken up into max-65kb chunks with headers
The protocol design and implementation is difficult to get right

There's some older work on what this could look like from a WIP branch in FileTransferChannel.proto and FileTransferDataChannel.proto.

Option 2: HTTP (my current preference)

Ricochet clients could offer files over HTTP, using a simple internal server and unique URLs, similar to OnionShare. That server could be on a different (possibly randomized) port of the same hidden service, on new ephemeral services, or even share the same port using protocol detection.

Benefits:

More robust/standard/future-proof implementation
Recipient backwards compatibility: can fallback naturally to displaying a URL
Possible for recipients to download using browsers & other tools
Could be used to offer files to non-contacts also

Downsides:

Even a very minimal HTTP client and server are a new protocol attack surface for contacts
Unclear what C++/Qt implementation would be safe enough to use

Server/client implementation

There is no need for a 100% feature-complete and spec-behavior-compatible HTTP client in Ricochet, because the use case here is very minimal. To keep the potential for bugs as low as possible, I'd limit the implementation to features we need, and not use (e.g.) chunked transfer encoding or esoteric options. We could even force Connection: close if it's helpful. This ends up being a pretty small amount of network-exposed code, and is still generally compatible with other clients and servers.

Server behavior and URL format

I favor putting the HTTP server on a randomized port under the same hidden service. Bringing up new services is sometimes slow or unreliable, and involves many new circuits and distinguishable network activity. This also means we can require the same .onion hostname, which prevents peers from being able to force a connection to an arbitrary .onion.

If the server is always running, it's possible for contacts and non-contacts to find at any time, though this shouldn't be of any value. If the server is only active when there are active file offers, that state may be detectable to contacts or non-contacts who can learn the port.

There is no point or need to try to disguise as anything other than a Ricochet client. All well-formed requests that are not valid file requests should be rejected with a generic 404 error.

I propose the following for download URLs:

http://[address].onion:[port]/ricochet/fetch/[uniqid]/[filename]

[address] must be the contact's ricochet connection address
[uniqid] is a large (>=128bit) random identifier
[filename] is the URL-encoded original name of the file

Only HTTP is permitted, no HTTPS. We do not want to require a TLS
implementation, and .onion makes it unnecessary.

Clients may refuse transfer URLs without a /ricochet/fetch/ prefix.

File URLs are related to a specific transfer and are meant for one time use. Range requests must be supported to allow resume, and may be allowed for parallel downloading. Servers should stop offering a URL once they believe the recipient has a full copy.

File offer protocol

We can package a file offer into an extended chat message:

message ChatMessage {
    required string message_text = 1;
    optional uint32 message_id = 2;
    optional int64 time_delta = 3;

    // Indicates a file transfer offer. message_text must begin with a valid
    // file transfer URL, terminated by the first whitespace or end of message.
    // The rest of message_text, if any, should be displayed as a user message
    // along with the file. 
    optional FileInfo file_info = 4;
}

message FileInfo {
    // required
    optional string file_name = 1;
    optional uint64 file_size = 2;
    // optional
    optional string content_type = 3;
}

message ChatAcknowledge {
    optional uint32 message_id = 1;
    optional bool accepted = 2 [default = true];
    optional bool file_received = 3;
    optional bool file_refused = 4;
}

Acknowledgement of this message also acknowledges the file offer. The URL may be immediately accessed to download the file. In addition to acknowledging the message normally, the recipient should send an additional ChatAcknowledge when the transfer has completed or if it is refused, with the appropriate field set. Senders should be prepared to consider a transfer completed based only on data transferred and not rely on ChatAcknowledge, to support older clients or alternative downloaders.

This has the neat property of being entirely compatible with clients that don't implement file transfers; the user will see the URL, and can download it in a browser. This isn't especially meaningful, though.

XXX There is no way defined here for the sender to indicate cancellation

Next steps

I'd like to move forward on this pretty quickly, so I'm going to be aiming for nailing down the protocol and major UX decisions very soon.

There's already a good chunk of code written, but it needs some fixing and will need changes based on the decisions here.

Any thoughts?

This looks good to me. I agree that the HTTP server direction feels like the right one.

A couple of thoughts - in no particular order or priority:

Without client authentication on the download, are there any attack vectors worth noting? I can't think of any that would be critical to the threat model...but for the record, here are some things which crossed my mind:
- Is there a potential for a DoS/slowloris type attack where an attacker gets a victim to serve a file and then the attacker themselves, or with multiple people all open and fill up the connection pool?
- Is there an attribution attack where an attacker gets someone to serve a file and they can then prove to another person that they are doing so?
Filename & Content Type are likely prone to the same unicode issues identified in #338
Clients should reject HTTP Redirects and other attempts to hijack the HTTP stream.
I think I would argue that invalid URLs should just trigger the close the connection, rather than a 404.

This has the neat property of being entirely compatible with clients that don't implement file transfers; the user will see the URL, and can download it in a browser.

This opens up few attack vectors noted above that the ricochet http client can avoid, but a browser can't. There are already warnings about opening links, but I wonder if the messaging and supported feature status of file transfers might open up a crack for phishing.

Without client authentication on the download, are there any attack vectors worth noting? I can't think of any that would be critical to the threat model...but for the record, here are some things which crossed my mind:

Well, there is authentication by using a URL unique to the recipient and file. The concern is that it's transferrable: this authentication doesn't identify the recipient to anyone other than the sender (and that non-cryptographically), and doesn't contain any secret the recipient may wish to protect.

Is there a potential for a DoS/slowloris type attack where an attacker gets a victim to serve a file and then the attacker themselves, or with multiple people all open and fill up the connection pool?

I think there isn't a DoS option, as long as we're limiting the number of connections per offered file. Since the intent is to share one file with one person once (allowing for resume and other issues), we can be strict about it. It could also be a good idea to set a minimum transfer rate, for usability also.

Is there an attribution attack where an attacker gets someone to serve a file and they can then prove to another person that they are doing so?

Interesting! I like this attack. There are weak defenses by changing the client authentication, but they really just discourage sharing URLs. I also like having these URLs not identify their intended recipient other than to the sender who generated them.

To actually remove the cryptographic attribution, we'd have to serve files on ephemeral services. I'm not sure whether it also requires one ephemeral service per contact. Technically, Bob can convince Alice to send a dangerous file, Bob shares the address with Carol, and then Carol can separately get Alice to send an innocuous file, so that Carol can confirm that the dangerous URL came from Alice.

The safest answer attribution-wise would be to use a unique ephemeral service per contact per session. My concerns with this are 1) service publication latency; 2) many more circuits required; 3) shows up more clearly to traffic analysis.

I think I might be okay with using one ephemeral service for all file transfer in a session. It's at least still cryptographically distinct, reduces all of those impacts, and the case where it fails is contrived.

Filename & Content Type are likely prone to the same unicode issues identified in #338

Content type is a MIME type, so no unicode issues there. For sanitizing filenames, I came up with some rules long ago, which will need some further thought. Must be very careful there.

Clients should reject HTTP Redirects and other attempts to hijack the HTTP stream.

Agree.

I think I would argue that invalid URLs should just trigger the close the connection, rather than a 404.

Without any response? Hmm. This makes it slightly more ambiguous for a ricochet client to tell if there was a network failure or if a URL is no longer valid. Otherwise I have no problem with that, and I'd like not sending anything back to unauthorized clients.

This has the neat property of being entirely compatible with clients that don't implement file transfers; the user will see the URL, and can download it in a browser.

This opens up few attack vectors noted above that the ricochet http client can avoid, but a browser can't. There are already warnings about opening links, but I wonder if the messaging and supported feature status of file transfers might open up a crack for phishing.

The precautions here would have to be the same as for opening any URL. I don't think this is necessarily a use case worth designing for -- I'm not sure it will end up staying in.

Unclear what C++/Qt implementation would be safe enough to use

Possibly already too large for this use case, but written with security in mind: https://github.com/reyk/httpd

message FileInfo {
    // required
    optional string file_name = 1;
    optional uint64 file_size = 2;
    // optional
    optional string content_type = 3;
}

I wonder why you would want to save an additional content-type apart from a user visible file name extension? Could that possibly lead to confusion on the receiver side either technically in which program to start, or non-technically in what a user expects?

XXX There is no way defined here for the sender to indicate cancellation

Isn't the bool file_refused flag in the ChatAcknowledge message a way to do that? Or do you mean after a file has been accepted and while in the middle of the transfer?

I think I would argue that invalid URLs should just trigger the close the connection, rather than a 404.

👍

Interesting! I like this attack. There are weak defenses by changing the client authentication, but they really just discourage sharing URLs. I also like having these URLs not identify their intended recipient other than to the sender who generated them.

At the cost of client compatibility, would it be feasible to encrypt the contents of the file with the recipients public key or something of a session id?

I wonder why you would want to save an additional content-type apart from a user visible file name extension? Could that possibly lead to confusion on the receiver side either technically in which program to start, or non-technically in what a user expects?

Hmm. I have two uses in mind:

When we implement inline images, we'll need a way to know what files are images
It could be nice to be able to show a different icon for some file types (image/, video/, etc)

Detecting these based on extension is at least as unreliable as having a (possibly wrong) content-type. You're right that there would need to be care on #2 to make sure that we don't show something as being an image when it will actually open as an executable. For that reason alone, maybe it's better to remove content-type and only detect by extension. Hmm..

XXX There is no way defined here for the sender to indicate cancellation

Isn't the bool file_refused flag in the ChatAcknowledge message a way to do that? Or do you mean after a file has been accepted and while in the middle of the transfer?

It is only valid to send ChatAcknowledge for messages you've received -- it makes no sense to acknowledge your own messages. So file_refused provides a way for the recipient to cancel, but I didn't define an equivalent for the sender to say "I'm not offering this file anymore" yet.

At the cost of client compatibility, would it be feasible to encrypt the contents of the file with the recipients public key or something of a session id?

Encrypting the file doesn't solve @s-rah's attribution attack: it only means that you need to provide a decryption key along with the URL. Common ways of encrypting to the recipient's public key have the same problem, because you're generally just wrapping the symmetric encryption key used for file data.

It would be more useful to require the recipient to give up their identity private key in order to demonstrate that the sender is offering a file. For that, we just need to authenticate the connection with the recipient's public key prior to serving files -- but this doesn't map well into HTTP (no, no TLS). That would be a point in favor of using another protocol.

A different approach entirely is to have the recipient host the server, with the sender acting as a client to upload data. This could work with HTTP or anything else, but it has some downsides in flexibility. There would be no attribution problem in that case.

When we implement inline images, we'll need a way to know what files are images

I'd like to express that I totally agree with your earlier statement: I don't trust image decoders.

it makes no sense to acknowledge your own messages.

My bad, I read "receiver" while it actually states "sender" :)

It would be more useful to require the recipient to give up their identity private key in order to demonstrate that the sender is offering a file. For that, we just need to authenticate the connection with the recipient's public key prior to serving files -- but this doesn't map well into HTTP (no, no TLS).

Sounds interesting, you have examples of other protocols doing this? Or how it could be setup? I guess the Noise framework could help here as noted in https://github.com/ricochet-im/ricochet/issues/72#issuecomment-258894126.

If case anyone needs this asap, then you should use onionshare along with ricochet.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512

John Brooks:

Only HTTP is permitted, no HTTPS. We do not want to require a TLS implementation, and .onion makes it unnecessary.

It's unclear to me whether this is part of Ricochet's threat model, but in environments like Whonix, the Tor client can read the contents of .onion traffic but not TLS traffic (since the TLS is decrypted in a different VM). -----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJYKlGYAAoJELPy0WV4bWVwu/gQAI/7bmPTKwbcsjEntuEjc03j nQFKDvSMg05FXR9rljFym5E++pr1FEteKb2qAu0Gub9CbkxCWhibBYNQHi1aFgy5 wgO07yom0oJI4JxBXA185TNSJKE7+LnDAqUCT0H1d0yCy5t4TZfFQHJFLdhOjdk+ GD+Lbuv3pH0GIInsK7iAFQlps0bQmI8aNrNAgoiuk3iWI9MqGFZ8BoXZlabMeGnF O0OeaibMjtvtuvX4mRsgTFZdNzzUmSfmkoYsABHDK/He4rcnUg6LUetVz16YKzuo i5Oxy+dQZ6FHHICsQq2Ajg35LfW95I27jcm0QBGFZ08tu3Igt7DFqw9Sq1Ydg5Hl J/HckRIA5pIJJUcOUa4ynFyk2t/hA0fQEjSoy9C66GnH4Fzt6X/Izs0CDkPkZOQ/ +Vo7wYkqyKcInn2uu7sb62lopX7L6QKHMORRiO/5echUMCNCs5fVx7pDenIKvXew 88QcZ/UkR48N9RdKaNC+UdCt3a/vJzQhbzB65cgGuPtvJLhUPFay2IK/szP0/Drw gPXT+kwbCcBKqbmzkniPysn0Z62wXOlZAfiI/BJ5TqbqILNlhyR9HFSb9MIImiNL Es+Q3vteUEm6pGVGPnqMZMm2dxVYmP5xx3pHhqq7GjaeGplNEi0ZwTsmSpfCztPB Y6ksrSAXNDadT0ijrXfu =TeTg -----END PGP SIGNATURE-----

Possible workaround: At least on Windows and Linux there's gpg4usb (gpg4usb.org), a self-contained GPG program that saves all files encrypted as text files:

Interface: 2016-12-08 19_14_58-encrypt file

Output (open data.docx.asc in notepad): 2016-12-08 19_08_13-data docx asc - notepad

Pitfalls:

Cut off? I sent some VERY long text bits over Ricochet but I can imagine it truncates them at some point. May need to send files in chunks.
Files are a little larger than the originals (encryption increases size and text is less efficient than binary)
Few people use GPG (excluding @JeremyRand above) so some time would go into explaining how it works, exchanging keys, etc.

Update: This program uses an outdated version of GPG so while it's likely still functional as a workaround, it may not function with other recently updated GPG-compatible tools.

1) The file transfer feature is essential in a real working environment. For work reason, I am often mobile and have to exchange messages and files with colleagues (rarely images, most .docx, xls and other file types). Of course we need to be sure these move in a secure and RESERVED channel. It is absolutetly imperative (n.1 request) that the file cannot be intercepted or taken by anyoneelse than the intended recepient. So I am definitely and totally against any use of public URL (even if scrambled or anything like that) that are visible in any way, or easily derivable (and transferable to others). The content of the conversation, as well as the file must remain strictly private between sender and receiver (pure P2P and not sharing at all). A type of "attack" you have to consider if using 'public' URLs is coming from the recepient itself.

Think to a not-so-loyal employee knowing of this mechanism and passing on a different channel (maybe even a second Ricochet IM chat without leaving any trace...) the URL to a third party as soon as he received it... The third party downloads the file, the employee downloads it too and pretend having done nothing wrong... In a system without any public/visible/copyable links, in which the only option he receives in the chat window is to download the file to his PC or refuse it, he is the only other person (other than the sender) having access to that file and if it leaks, then there is no excuse of it being due to any 'public http URL....

2) Have you looked at other chat solutions that offer P2P file transfer to get ideas of something already working? While waiting this option to appear, we are using on the field QTox and it has a working and nice file transfer option fully embedded within the chat. (in the past I have also used uVNC and it did easy file transfers over p2p secure tunnels)

3) My old style gut feeling would be to dump any idea of http protocol and start working on efficient packetisation of files into the existing audited, secure and trusted ricochet protocol (with the necessary small modifications and maybe an optional additional layer of simple encrytion), as described in your first option. Even simply wrapping the file around with Mime64+a lite scrambling (and do the reverse at reception) would be enough as a wrapper. I prefer security and certainty there are no risks of my files getting into competitor's hands rather than a fancier and/or faster transfer mechanism.

4) Don't forget most people still use email (MIME-64 unencrypted encoding of attachment files) to send files around... slow, inefficient and definitely not secure. Top speed is not the critical part of the transfer in a real-world mobile environment (and some mobile connections used remotely are not anyway even capable of supporting huge speeds...). To me it is more critical that I have an indicator of the transfer status to my peers, to have the possibility to 'pause/resume' the sending of long files and to have a mechanism handling connections lost (which is occurring a lot on the field) with ongoing transfers.

There is a small, lightweight Tor Hidden Service based file sharing application called Onionize by @nogoest (written in Golang).

I'm curious if there is possibility of making a nice IM + file sharing offspring ;)

ricochet-im / ricochet

File transfers #15

UX