Channel file transfer implementation

Fahrradkette commented 6 years ago

I'd like to push #channel file transfer, so a spec can be derived.

Users:

Alice: registered user, permissions to post files in the channel
Bob: registered user, trusting Alice (i.e. having Alice's Cert locally)
Charlie: unregistered user, trusting nobody
Dave: same as Alice
Story:
Alice drops a file in the chat window (or clicks "post file")
Bob clicks on the appearing link in the window and chooses a location for the file on his machine.
Charlie sees the file, when clicking on the link is prompted to trust Alice
Dave drops a file in the Chat window but hole punching/upnp fails, he sees an error message and is pointed to an external service

dividing it up

- ACL's / permissions
- Certificates / encryption handling
- initiating connection / hole punching
- transfer handling / error recovering
- GUI

I did some research on the hole punching part, I found there is are some solutions like raknet and its forks, for upnp there is a C library called miniupnpc (which is used by raknet I believe). Apparently hole-punching works better with UDP, I have no experience with it since my router supports upnp.

Do you guys agree with this way of breaking it up into parts? I'd be most delighted to read about your ideas and mental models of how it should work and how it should be layed out/divided up.

I'd love to get this going.

ghost commented 6 years ago

Thanks for your interest in this feature, @Fahrradkette. My first impression, though, is this seems like an overly complicated implementation. My thoughts on this could be implemented were the following:

A particular channel has two new ACL permissions which users can be granted: file read and file write
Users with the file write permission can upload files to the server that are attached to a particular channel
Users with the file read permission can view file and download files that are attached to a particular channel
The file data is transferred over Mumble's control protocol
Files are stored in the database along with all other blobs
(Need to think about file size limits)

Of course, this means that there is no direct user-to-user file transfer, but perhaps that could be added in the future.

I also think client-to-client communication is the wrong way to go; everything should flow through the server.

mkrautz commented 6 years ago

@bontibon asked me to voice my concerns about using the control channel for file transfers here. (I suggested perhaps piggy-backing on HTTP instead, if/when we do that, for WebSockets).

Basically, as hacst said here: https://github.com/mumble-voip/mumble/issues/1307#issuecomment-76296989, "no stream multiplexing on the control channel".

There's nothing prohibiting us from creating a BlobTransfer (name TBD) message and using that to perform file transfers. But do we want to clog up Murmur's main thread with all these file transfers? Note that all servers's control channels share the main thread. And on the client side, the official desktop client's control channel handling is also tied to the main thread.

Fahrradkette commented 6 years ago

So there'd be an extra thread on both the server and the client? About storing the blobs, usually people don't need the files anymore, shortly after they shared them. Would it make sense to only keep them in memory, i.e. in a fifo with a "decay" timer?

mkrautz commented 6 years ago

About storing the blobs, usually people don't need the files anymore, shortly after they shared them. Would it make sense to only keep them in memory, i.e. in a fifo with a "decay" timer?

I'm not sure that's the general perception.

I'd actually think that most of the files would be permanent.

But I guess temporary files might make sense... How do other systems do this?

Fahrradkette commented 6 years ago

I'm not sure that's the general perception. You're right, it's probably quite likely that most people want the files being permanent.

How do other systems do this? I don't know about Discord but I read Teamspeak saves them as plain files on their server file system.

What do you think would be a good solution?

hacst commented 6 years ago

If this is meant to scale to large hosters with big servers I don't think there is any reasonable way to implement this inside of murmur. Much less over the control channel as it works today. The bandwidth and storage requirements for proper file download feel imcompatible to the low-latency voice tasks a murmur server performs.

If the file upload/download portion is split out I think trying hard to use something http based makes a lot more sense. If designed carefully you can take your pick of servers and CDNs that will allow you to serve any kind of traffic and meet any storage needs.

What complicates such a design is the requirements of access being mediated through murmur ACL. This is definitely something that we want or a file upload enabled murmur will turn into a public warez server very quickly. One way to implement this without constant chatter between murmur and some custom server middleware would be to use some standardized access token based approach like JWT. This would allow us to sign a set of session identifying information that the http server can validate independently of the server. I think we could probably get away with guarding access by ip and aggresive token expiry. The nice thing about this is that this is stuff that a CDN like fastly can validate in the edge node. Looking at the S3 documentation similar support could be achieved there using presigned URLs with custom policies. This would work for upload and download. For big installation dedupe could be achieved using content addressable storage stragies (e.g. sha256 as filename, usual privacy drawbacks apply).

With such a design murmur would be responsible for remembering what belongs to which channel and who is meant to have access. Upload would be:

Client asks murmur to upload a file with given size to channel X
Murmur validates against ACL and quotas. If any violated the request is rejected. Otherwise murmur generates a (unique) POST url containing a signed token whose validity is bound to ip and expiration and sends that to the client
The client uses normal HTTP functionality to upload the file to the given URL
The client reports upload completion to murmur which validates it (could be as simple as a HEAD request). After validation the file will become visible to other clients

Download could then be:

Client asks murmur for a file
Murmur validates against ACL and quotas. If any violated the request is rejected. Otherwise murmru generates a (unqiue) GET url containing a signed token whose validity is bound to ip and expiration and sens that to the client
The client uses normal HTTP functionality to download the file from the given URL

Imho this could be a powerful and flexible approach. The loose coupling definitely has drawbacks like murmur and the storage not being in sync, enforcing quotas being tricky, expiration and so on though. Also we have to make sure this cannot be abused by servers to make clients access random servers. Upload step 4 will probably also require hooks for triggering of additional file validation like virus scans before making the file available for download (s3 doesn't seem to allow restricting overwrites which could be a problem here).

Another issue with this approach is that for small installations requiring an additional server is kind of a pain. Murmur having a small webserver embedded could work around that but that is quite the complexity jump on our end. Maybe providing some PHP scripts you can throw on a normal apache+modphp or other hoster to provide the file upload target would be simple enough?

Note that this is basically a braindump on my end and I haven't thought to deeply about implementation details or security considerations. It is quite possible that this isn't such a good idea after all for some reason.

Fahrradkette commented 6 years ago

Yesterday Natenom and I talked on mumble and I drafted an example of a GUI, the idea is that each channel has files attached to it. People who uploaded files are able to delete their own (Bob in this case). Deleting other peoples files should be only allowed to server admins.

filebrowsergui

I also drew an overview how I imagine murmur, mumble and the file server could communicate.

Murmur cares about ACL's while the file server cares about per-user/channel/server quotas and file expiration. I think we should design it so we don't even have to trust the mumble client, as you said, we don't want it turn into a warez server.

Edit: In my head the communication between murmur and the file server goes over gRPC

File Upload

mumblefileserveruploadprotocol

File Download

mumblefileserverdownloadprotocol

ghost commented 6 years ago

@Fahrradkette: thanks for the mockup and flowcharts.

A few notes:

I think that the murmur<->file server communication should just be standard HTTP, no gRPC. This allows people to run a file server on pretty dumb hosts.
- Step 6 would probably have to be combined with setup 3 in the file upload process in order for this to work
How long does the nonce last? I think it would be a good idea to handle network errors gracefully, and automatically renegotiate the nonce so that downloads can be resumed on network failure.

the file server cares about per-user/channel/server quotas and file expiration

What representation of the mumble tree does the file server get?
How does the file server handle the files of a user who gets unregistered from a server?
What does the UI look for managing quotes? What do the endpoints look like on the file server?

Avamander commented 6 years ago

With file streams/sharing maybe it's worth considering other types of sharing like video?

Fahrradkette commented 6 years ago

be standard HTTP, no gRPC. This allows people to run a file server on pretty dumb hosts.

Can we define the minimal capabilities of such dumb hosts? For instance, if it can't save state we got to extend the murmur<->file server comm.

How long does the nonce last?

Guess something rather short like 5-20sec would be ok, I have no experience in that though.

What representation of the mumble tree does the file server get?

I thought of just a list of all channel ID's. It's purely for channel-wide quotas

How does the file server handle the files of a user who gets unregistered from a server?

When the admin unregisters a user he can choose to delete it or change the user ID on the files to a special ID like -1 which indicates "unknown"

What do the endpoints look like on the file server?

I don't know exactly yet, it kinda depends on how smart it will be, especially on if it handles quotas by itself.

If we decide to have a somewhat smart server I can setup a prototype/mock in python, when we decide on the murmur<->file server protocol it'd actually be able to run :)

Fahrradkette commented 6 years ago

@Avamander: Do you mean pointing the mumble client to a video stream, i.e. implementing video functionalities on the client? By now the Idea, as far as I understand it, is to download files from a to-be-defined file server via https.

hacst commented 6 years ago

gRPC isn't exactly a good protocol for file transfer. Also no browser can speak it currently so web-based clients would be SOL without a gateway. Imho HTTPS is more suitable and we can choose from tons of great highly performant server implementations willing to pump the bytes. As mentioned before, with a bit of care you might even be able to use plain S3 or be compatible with some of the CDNs out there. The more stupid and boring you can make the server the better.

E.g. it isn't quite clear to me what is gained by the file-server knowing about channel structures. Murmur knows it and hopefully it also knows what and how much was uploaded for each channel and what the quota should be.

Avamander commented 6 years ago

@Fahrradkette For example yeah, but I was thinking about screen and webcam video sharing too. Just mentioning this so that if the functionality isn't instantly implemented at least the possibility is considered.

ghost commented 6 years ago

@Avamander thanks for mentioning it, but I think video support will be completely separate from the file sharing implementation.

Fahrradkette commented 6 years ago

the dumbest file server would only share file ID and file size (for quotas) with murmur. File expiration, quota and channel-association logic would all happen in murmur.

A question remains on how we do the authorization against the file server.

Also not clear to me is the process of having mumble accepting the file servers ssl certificate. Somehow the file server should let murmur know about it so it can forward it to mumble.

Edit: the checksum only of course. That would probably not work with browser-based clients though.

Natenom commented 6 years ago

It is important that the new ACL for file upload is a global permission that can only be changed in the root channel. If this was a channel permission then everyone who owns a permanent channel on a server or is able to create a temporary channel would be able to control file upload.

ElfEars commented 6 years ago

Er, maybe that's going too far.

I'd reccomend splitting it into 2 permissions separate from the current channel creation permissions:

Ability to create channels with file upload. (Allows you to allow file upload in any channels you create, can turn off upload in any channels you created at will)

Ability to edit other's file upload settings (Mostly for mods, allows you to edit if other people's channels have file upload)

Fahrradkette commented 6 years ago

@Natenom @ElfEars Thanks for brining up the ACL points. What new permission flags are needed in ACL.h? I think we need at least "FileRead" and "FileWrite" for the channels.

Since it seems it's the right way to have murmur handle the quotas , we need at least a "SetQuota"/"ManageFiles" permission on channel scope.

That permission would allow channel admins to clean up the channels, i.e. renaming/deleting files. It could also allow channel admins to set quota policies like "prevent upload, delete oldest file, delete biggest file"

I think we might also need a server-wide permission for those admin tasks, thought we could use an existing one like "Register"

Fahrradkette commented 6 years ago

Sorry for the long absence, but here is the requested mockup for the Quota Editor.

Edit: the right window is to pick files to be dropped on another channel to the left (equivalent to file moving on a desktop). Please post your Ideas about you like to interact with the files. Like content of context menus, additional delete/move/whatever buttons, key-modifier (i.e. when dropping a file on another channel while holding ctrl, it copies it)

I also like to write down how I think about the roles of the people involved.

Server Admin:

Set Quota: all Channels
Move/Rename Files: all Files in all Channels
Delete Files: all Files in all Channels
Upload Files: all Files in all Channels
Download Files: all Files in all Channels

Channel Admin:
Set Quota: allowed Channels
Move/Rename Files: all Files in allowed Channels
Delete Files: all Files in allowed Channels
Upload Files: all Files in allowed Channels
Download Files: all Files in allowed Channels

Upload User:
Set Quota: none
Move/Rename Files: own Files in allowed Channels
Delete Files: own Files in allowed Channels
Upload Files: to allowed Channels
Download Files: from allowed Channels

Download User:
Set Quota: none
Move/Rename Files: none
Delete Files: none
Upload Files: none
Download Files: from allowed Channels

Guest User:
Set Quota: none
Move/Rename Files: none
Delete Files: none
Upload Files: none
Download Files: none

To cover these roles we need a set of additional channel-based permissions (read, write, rename, move, delete). The question is if we want to extend the existing permissions enum or if we should add a new file-specific one?

Since we're already covering some of the implementation questions, I think these additions/extensions of mumble-server are needed:

File Fields: (own class)

ID
Size
displayed name
uploader(client name or ID)

additional channel fields: (extend the existing one)

List of File IDs maybe:
- WebServer handle if there is a need for multiple FileServers per Murmur (channel based FileServer)

additional server settings: (extend)

WebServer handle (server name, murmur-access token)
upload timeout
download timeout

The control channel also got to get some new "words" in its vocabulary:

additional control messages: Client -> Server:

File list request (channel)
Upload request (channel, displayed name, size)
Download request (channel, displayed name)
cancel Upload (channel, displayed name)
cancel Download (channel, displayed name)

maybe (we shoudn't rely/trust on the client):
- Upload finished (channel, displayed name)
- Download finished (channel, displayed name)
Server -> Client:
Upload allowed (fileserver URL, token)
Upload denied (reason)
Download allowed (fileserver URL, token)
Download denied (reason)
File list allowed (list of displayed names)
File list denied (reason)

We also need to define how the Web/FileServer talks with murmur(server) and mumble(client).

Webserver Endpoints: Server -> WebServer:

Start upload transfer (URL, client token) POST
Stop upload transfer (URL, client token) POST
upload transfer status (response to be defined) GET
List active upload transfers (response to be defined) GET
Start download transfer (URL, client token) POST
Stop download transfer (URL, client token) POST
download transfer status (response to be defined) GET
List active download transfers (response to be defined) GET

Client -> Webserver:
upload (URL, token) POST
download (URL, token) GET

The Idea is to use NetworkAccessManager on murmur to talk to the WebServer.

The files would be served at https://WebServerDomain.tld/files/FileID while the commands would go to .../control.

The client token could either be a custom header or we could use basic authentication. On apache with mod_authn_dbd "Start [upload | download] transfer" would tell it to add the user & password to its database, "Stop [upload | download] transfer" would remove that user/pw entry.

Since we don't know when the transfer is finished, we need to run a timeout on murmur to stop them. By doing so we don't require the WebServer to issue a callback which would allow "dumber" WebServers.

I also read some docs about S3, they seem to have some sort of timed out access, but the minimum time of 15 minutes is quite high imo.

Some Questions...

HTTP:

Is that communication layout sound?
Does it make sense to target the LAMP stack for WebServer?
For the Client<->WebServer auth, do you guys favor a custom header or basic_auth?
How about Murmur<->WebServer?
What kind of status do we require from the WebServer? (canceled, pending, finished transfers)

Adding and extending internal objects:

Could you point me to the parts which needs to be touched? I don't yet have the codebase internalized.

Roles

Is there a need for the role of Channel Admin (like a clan leader/moderator)? A description of the structure of a running server would help tremendously.

Edit: what should be the next step? Should I issue a PR, if so, what part is thought through enough for having a go?

gettyhub commented 6 years ago

Putting file transfer over control channel would most likely tie up operations for sure, even if you were siphoning bytes over to just another user. uMurmur would probably go to town if you tried to run file transfer through it. Just running pictures through chat would be enough to bog down everything as it is. And does that come over the control channel?

Maybe just easier to leave file transfer up to an IRC session that has already implemented it. Mostly IRC can DDC the files directly once you have the IPs, but uMurmur exposes those where Murmur by default doesn't (only to admins).

peylight commented 4 years ago

It's a good idea. Any new decision about this feature?

Krzmbrzl commented 4 years ago

Not afaik

Avamander commented 4 years ago

I get the feeling more and more that Mumble should either be integrated into an existing IM solution with these features or integrate an existing IM solution into it.

Building Yet Another Chat Platform™ sounds too time consuming when that time could be spent at improving the core functionality of Mumble.

TerryGeng commented 3 years ago

This idea (creating a file host in murmur binary) just doesn't sound very mumble to me I afraid...

What I think we can try is to cook up some user-to-user file transmission mechanism, just relaying things without storing them... Then we can create a bot to store those files, which can be written in pymumble, or does the new plugin framework proposed by @Krzmbrzl provide convenience in making bots?

Or if people are unhappy about the concept of a bot, maybe we can just invent some mechanism to hide a bot in a channel, and use plugins to wrap up the interaction with that bot?

Krzmbrzl commented 3 years ago

Relying on bots for such a functionality does not seem applicable (from a usability point of view). If this is implemented, then it should be implemented in the server in such a way that every server supports it (provided it is new enough) out of the box without any additional requirements.

or does the new plugin framework proposed by Krzmbrzl provide convenience in making bots?

No - it would only allow for a bot-like agent to be active as long as you (the user) is running Mumble (and only on the server you are currently connected to).

peylight commented 3 years ago

Maybe it is possible to use https://github.com/sprinfall/webcc

gitgrub commented 3 years ago

Please keep Mumble as a plain voice tool. I think file transfer is over its scope. Users can just post an URL to a free file hoster of their choice so that others can download a file.

Next thing users will ask for is a virus check for the files. Mumble server hosters might have a sort of responsibility about the contents or trustworthiness of a file. This does not sound good.

While I like some concepts postet here, I dont think its a job for Mumble.

Avamander commented 3 years ago

I honestly really really think that Mumble should look into integrating/iframe'ing a Matrix client and delegate building that wheel to a separate team. It shouldn't be too hard to either iframe Hydrogen or integrate Quaternion.

basilbowman commented 1 year ago

I would LOVE to have filesharing functionality in Mumble - being able to centralize comms would be incredibly helpful for our usecase (lightweight PTT intercom system for virtual events)

mumble-voip / mumble