Video feature discussion

davidebeatrici commented 4 years ago

In the roadmap for almost a decade, but with no progress on it whatsoever.

I believe it's time we work on it and hopefully make it available with 1.5.

For encoding/decoding we can use libavcodec and then allow the user to choose the preferred codec.

As for video input, I came up with 3 different solutions for the implementation:

QCamera

https://doc.qt.io/qt-5/qcamera.html

Advantages

Cross-platform, supposedly working out of the box.

Disadvantages

Dependency on the Qt framework. Right now it's not an issue because most of our code depends on it, but we're considering replacing most of it with modern C++ and POCO libraries while keeping Qt only for the GUI.

v4l2 & DirectShow & AV Foundation

Advantages

Low-level control of the camera devices.

Disadvantages

Separate code for each platform.
Probably quite long code required.
Hard to maintain.

OBS integration

Advantages

OBS is cross-platform, thus we would only need to write different code to interface with it on different platforms, similarly to what we do with the Link plugin.
OBS allows a user to manipulate the video input as desired (camera settings included).
We don't have to maintain the video input code ourselves.

Disadvantages

Reliance on an external program.

Krzmbrzl commented 4 years ago

Well I think we have other priorities right now than to integrate such a completely new feature that probably implies a huge amount of work.

That being said, to me the OBS solution sounds like the best considering we plan to make the whole program working without Qt as well.

davidebeatrici commented 4 years ago

Indeed.

Avatat commented 4 years ago

In my opinion, we should focus on fixing bugs, improving the core features and UX. We have a lot of work, to do the best VoIP communicator in the world! :)

But, I see nothing wrong in adding a video feature to the "to-do list"!

streaps commented 4 years ago

I don't like the idea of having a specific external application as a dependency. This might be cool for a few use cases, but the majority of users want a simple solution. Switching video on and off easily within Mumble -- not having to use a full-blown broadcast solution with Mumble as a streaming back end.

OBS uses Qt, so you would still have no video solution without Qt dependencies.

I think it would be very useful to have a very simple video implementation. low-bandwith, low-resolution video. It doesn't have to be the next Zoom, it doesn't need desktop sharing and stuff. Only some visual feedback.

"the best VoIP communicator in the world" doesn't really sell any more. Ignoring video today will push the feature another 5 years into the future.

Just my opinion, of course it's up to the developers on what you want to focus.

felix91gr commented 4 years ago

I have problems with Jitsi all the time because WebRTC is still on its infancy right now.

And Zoom is a no-go for me, because of the hostility and surveillancey practices they've already shown towards users. It's also closed-source.

For all of this, I would love, LOVE something with the level of quality Mumble has that was open source. I didn't imagine it could be Mumble, but hey, if it seems possible and the audio engine won't suffer because of it... then yeah, why tf no :)

PS: if you DO go for this, ping me. I don't have much money but I can make a donation for this :)

felix91gr commented 4 years ago

Also, regarding this disadvantage of QCamera:

will definitely result in larger static binaries.

I don't think that's much of an issue.

People are tending towards making larger static binaries because dynamic linking is really hard to get right, and its original reasons for existing are not really that important anymore. As far as I understand, those reasons were lack of good source code package managers like npm and cargo to add dependencies to code, and lack of enough disk space to put all of the big binaries in place.

I don't think that now, in 2020, large static binaries are such a big deal.

If this is the alternative you have to follow to make life easier on the maintainers, then... why not? I do think you guys deserve a break.

davidebeatrici commented 4 years ago

Thank you very much for the feedbacks!

@streaps We could implement video input through OBS first (less work required) and then add our own. That would allow a user to select either OBS or the direct webcam as input.

@felix91gr We only provide static client binaries for Windows and macOS (which don't provide a package manager out of the box). In future we will probably provide an AppImage file for each release.

Right now I don't know by how much our binaries' size would increase by adding Qt Multimedia, but it's definitely going to stay below 100 MB.

I removed the disadvantage from the list.

streaps commented 4 years ago

Requires a plugin that may not be present to be shipped in official distribution packages; for example, the Debian one doesn't seem to provide it.

I wonder which plugin that is?

davidebeatrici commented 4 years ago

There's at least a plugin for each platform, the one for Linux provides GStreamer integration.

I also just found out the plugin is provided by https://packages.debian.org/stable/libqt5multimedia5-plugins.

I will test https://doc.qt.io/qt-5/qtmultimedia-multimediawidgets-camera-example.html and report back.

streaps commented 4 years ago

What about libVLC? I have no experience with it and don't know if it can be used for this purpose.

davidebeatrici commented 4 years ago

Looks nice, but it's strictly focused on media player capabilities rather than encoding/decoding.

I also just found out the plugin is provided by https://packages.debian.org/stable/libqt5multimedia5-plugins.

I will test https://doc.qt.io/qt-5/qtmultimedia-multimediawidgets-camera-example.html and report back.

Turns out I already have that package installed.

It was my fault, I ran the binary from file manager forgetting about firejail.

Running the example in a non-restricted environment works fine, I updated the first message.

ghost commented 4 years ago

In regard to OBS integration method — virtual cam plugin exists https://obsproject.com/forum/resources/obs-virtualcam.949) for Windows and Linux.

trudnorx commented 4 years ago

I'd rather not have video in Mumble. Simplicity is a virtue. Programs being too bloated is a big problem.

I disagree with the idea that it could be implemented in a non-obtrusive way.

davidebeatrici commented 4 years ago

@Reikion I was aware of it, however a direct integration would be more efficient (Mumble <-> OBS rather than Mumble <-> V4L2/DirectShow <-> OBS).

@trudnorx What makes you think that video cannot be implemented in a non-obtrusive way?

TerryGeng commented 4 years ago

Hmmmm....

What about openCV? https://stackoverflow.com/questions/2570359/cross-platform-camera-api/3909731 https://stackoverflow.com/questions/278112/webcam-library-for-c-on-linux I know there're some people use openCV for camera capture with python.

hhirtz commented 4 years ago

To solve the scaling issue[1], mumble could merge incoming video streams into one, like so: https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fi.ytimg.com%2Fvi%2FVnyitUU4DUY%2Fhqdefault.jpg&f=1&nofb=1

If the room gets bigger, it could show in big the one who's talking, and have little version of others' video streams: https://enterprise.verizon.com/content/dam/img/products/business-communications/zoom-wii-sm.png

This way tx bandwidth is reduced from O(n^2) to O(n)

[1] https://wiki.mumble.info/wiki/Planned_Features#Video

streaps commented 4 years ago

What about openCV?

it's huge, but very powerful

What about Gstreamer?

streaps commented 4 years ago

This way tx bandwidth is reduced from O(n^2) to O(n)

How would that work?

hhirtz commented 4 years ago

How would that work?

Instead of sending all video streams to each client, the server merges them into one and then send it to all clients. The way it is merged could be a setting, or could depend on the number of clients in the room.

For example if you have less than four people in the video conference, mumble could make a split-screen-like merge. If there is more people, it would be better to have the one talking in big and the others on the side of the video, like you can see in the images from my previous comment.

streaps commented 4 years ago

But how would that work?

Re-encoding everything? That wouldn't scale.

mirh commented 4 years ago

Look, this is your program and all. But there is already jitsi for everyone's favorite open source meeting. There's no point in duplicating features and market positions (also somewhat forfeiting the core demographics of mumble, which is gamers AFAICT) If you want to expand yourself, you should target the discord community. And I don't really think videos are important (or even desirable) there.

felix91gr commented 4 years ago

But there is already jitsi for everyone's favorite open source meeting.

Eh, I dunno about that. I've used jitsi extensively these past months and lemme tell you, there are some parts of it that really really suck.

In fact jitsi has broken our server's port 443 and so far nobody in StackOverflow seems to know what the heck is going on 😑

They also have lipsync problems that they've stopped testing for since 2-3 years ago. The other day, a friend and I managed to separate the video from the audio for about 80 seconds. Seconds!! My audio was behind the video by about 80 seconds!

Anywho. That's my rant about jitsi. It kinda sucks to use it atm, and it sucks doubly because it's more or less our only (for >2 people) open source alternative.

seniorm0ment commented 4 years ago

But there is already jitsi for everyone's favorite open source meeting.

I actually came here to suggest, maybe implementing a plugin for Jitsi to be used in the backend on Mumble. Now I say plugin, because I think it should be optional, both client and server side whether they want to even incorporate video chat. I do see some controversy over it here though, and I know WebRTC can be a privacy concern depending on it's implementation.

Personally I am huge +1 for adding in video chat, given it's stable, does not have any insane compression like Discord, (lossless would be amazing for connections that can support it), is simple to use and integrated well and is fully encrypted.

I think the best way to do this would be, by default include it in Mumble client-side, and offer the ability to uninstall it easily without breaking anything. I am generally for adding onto programs instead of removing features when it comes to minimalism, but I think in this case it makes it easier for less technical people who generally don't read installers carefully or who may miss a checkbox, or whatever the case may be.. to have video chat "just werk" instead of having to worry about going into settings, finding the option to enable.

As for server side, I believe it should be highly optional whether it wants to be used. When client side, the icons to video chat should not even show up if the server the user is connected to doesn't support video chat (and it should also represent this under the server's information).

streaps commented 4 years ago

Just use plain jitsi meet then. Create a link and post the URL in the text chat. I don't see any advantages in integrating it into Mumble.

seniorm0ment commented 4 years ago

Just use plain jitsi meet then. Create a link and post the URL in the text chat. I don't see any advantages in integrating it into Mumble.

It would be very nice to have a seamless integration, which is why I recommended a plugin for Jitsi for a click of a button. I feel there may be some complications to this. I do also agree with the above that Jitsi needs some work still, mainly client side stability.

I would be totally for a more minimal introduction of video chat directly through Mumble if possible though. Maybe something like FFMPEG for video capture? Not entirely sure, just throwing stuff out there.

seniorm0ment commented 4 years ago

I want to add onto that, even if it wasn't Jitsi being used on the backend, having our own would be great. Just something. If there's one complaint I hear trying to switch people over to Mumble, it's the lack of screen-share capability. I can tell them to download Jitsi but it would require them installing and setting up another program, and that doesn't sound as good as a seamless all in one solution built right into Mumble.

5trongthany commented 3 years ago

100% this being added and would happily help in any way I can to achieve it.

Green-Sky commented 3 years ago

Just want to throw in my personal take on the matter: Video steams are becoming a, kind of, hard requirement for a lot of people and to stay competitive, we need this feature sooner or later. But we should also not discard a feature I see as Mumble's strong point: Lightwight to host. Mixing, resizing and/or re-encoding on the server are kind of the opposite... I see a problem though: as unlike opus, there is no de-facto video codec for low bandwith and hardware acceleration, which i see necessary for video. So the protocol should be designed extensible. Also decoupling video from the client and a plugin-like system would be a great fit, i think...

davidebeatrici commented 3 years ago

Rendering definitely has to be done client-side, especially due to VPS usually not providing hardware encoders.

By supporting multiple codecs we would also guarantee flexibility when it comes to the encoders available on the system.

For example: H.265 and H.264 are probably less efficient than AV1, but hardware encoders for them are widespread.

If the CPU is strong, using the most efficient codec is desirable. If saving CPU is a priority, using one of the codecs supported by the hardware encoder is the way to go.

seniorm0ment commented 3 years ago

Ffmpeg does recording I believe, is minimal, and is very extensible. Not sure how it would work in a situation like such, but it sounds like it might work well to me.

2xB commented 3 years ago

New idea to make video scalable: Let the clients send lower-resolution video streams the more people participate, i.e. let the clients scale their outgoing streams so that their total combined bandwidth never exceed a value set by the host. Possibly allow one video stream at a time to be high-resolution, so that whoever speaks can be seen clearly no matter how many people are in there. The total bandwidth is therefore always equal to or lower than the given combined bandwidth for all low-res streams plus one high-res stream.

This should scale similarly to audio for the host and worst-case O(n) for the client, one always sees everyone and can interact non-verbally, the speaker is always clear and the host does no re-encoding.

Green-Sky commented 3 years ago

Let the clients send lower-resolution video streams the more people participate

I don't think that's the way to go.

How about the client telling the server, to reduce the frame count of the streams of the others? While everyone still sends the full stream to the server, the server can than only send every Xth frame (or key-frame). It is not ideal but you would not have to do variable bandwidth or multi-bandwidth compression. Only works ofc if the key-frame-interval is relatively low...

2xB commented 3 years ago

Changing the transmitted frame size on the fly should definitely be feasible, as this only occurs if participants join/leave. Changing this on the server therefore implies heavy and avoidable requirements on the server and its network connection.

The screen of each participant has a certain fixed size, and the more participants there are, the less pixels per participant can therefore be shown on said screen. My suggestion just means that no vast amount of pixels are transmitted that can't even be shown on most screens.

Dropping from 30 fps to 2 fps when 15 people logged in makes non-verbal interaction rather difficult, but dropping from 1080p to 360p per image would be perfectly fine - 1/4 the image height and width implies 1/16 the pixel count. And even if this is about being able to look at individual people: See that the quality settings of basically every dynamic web video player in the world (e.g. YouTube) allow changing the resolution but not the framerate at realtime.

Another issue with any frame dropping: The framerate frequently does not scale linearly with bandwidth. This is because many codecs mainly transmit the difference between frames and not the frames themselves, and the differences occuring after 1/2 s are larger than after 1/30 s, especially for flickering laptop cameras (e.g. requiring more i-frames). So the size of a 2 fps stream might well still be larger than 1/15th of a 30 fps stream. This behavior of codecs also makes server-side frame dropping non-trivial, since the server needs to re-encode every stream of which it drops frames.

All in all, let's please discuss this on an argument-based level. This does not have to be a pure matter of opinion (or just of disliking a post?).

kermorgant commented 3 years ago

I'm totally new to mumble, been attracted here by its low latency performance. So while I miss the video feature, I'm a bit afraid of what its impact could have on this competitive advantage.

With that in mind, would it make sense to limit for 1 video stream at a time ? Or just screen sharing ? After all, many zoom/google meet meetings are happening with one person presenting slides while others have their webcams shut...

seniorm0ment commented 3 years ago

competitive advantage

what do you mean by this? it seems like you're asking if a video stream happening will cause performance/bandwidth issues for users who don't want to partake in the video call?

kermorgant commented 3 years ago

competitive advantage

what do you mean by this?

I was referring to mumbles low latency, which I've heard is first class.

it seems like you're asking if a video stream happening will cause performance/bandwidth issues for users who don't want to partake in the video call?

I'm wondering if it would make sense to limit a meeting with n participants to only for video stream at a time (having been in many google meet meetings with almost everyone's webcam shut off, I can see the use case). Thus, I'm raising this question as to whether it would be helpful in regards to performance/bandwidth/scalability issue to implement this video feature in such a restricted way.

2xB commented 3 years ago

Of course, limiting the amount of concurrent videos streamed also solves the bandwidth problem. And there definitely are applications for single-video solutions (e.g. mentioned meetings). On the other hand, there are applications that work significantly better where everyone can stream a video simultaneously (e.g. talking with friends), and for the later also multiple solutions were presented - I still like the solution I presented in my last two posts for reasons elaborated there.

So since the main theoretical challenge is solved for both applications, it is my belief that the issue is no longer how to make video work conceptually, but it is a matter of someone programming a proof of concept that can be tested, discussed, refined and included.

Kissaki commented 3 years ago

Not sure if I misunderstood, but it sounds like OBS would be running alongside, and set up the video source as a scene?

I don’t think adding OBS as an external, runtime application dependency for this feature would be a good idea.

May be fine for those already use OBS, who could use a separate profile for this use case
Impossible to use it for streaming or recording while providing a webcam source for Mumble at the same time
A huge barrier to entry for less experienced users, leading to a support burden and big user confusion and frustration burden

OBS is not a very simple program. It has some setup effort, a huge arrange of advanced settings, and technical concepts and scene editing to set up. Even the simplest use case, feeding a webcam, would entail selecting and adding a source, and adequately positioning the source, and potentially encoding settings, and adequate mode of operation/linking to Mumble.

If this is not about integrating OBS output as a source as a whole I don’t think the description is very clear. I could see using the OBS webcam source, but wonder how easy it is to extract and use that selectively.

seniorm0ment commented 3 years ago

latency on Mumble is still very subjective to all users connections, where they are in the world, and how good the server hosts connection is. It really shines on LAN. That being said, I don't agree with the idea of restricting the user, and instead feel that should be the choice of the server host."

Also as far as OBS, I fail to see it's use case here, I believe there are ways to do it with ffmpeg which would be significantly more minimal and easier if so.

Krzmbrzl commented 3 years ago

This is now tracked in our new "Big idea collector" #5237

streaps commented 3 years ago

Why? Were should it be discussed now?

Green-Sky commented 3 years ago

I guess still here, since the "Big idea collector" is locked for colabs...

Krzmbrzl commented 3 years ago

Green-Sky is correct. That's also what it says in the first post in the "Big idea collector" ;)

streaps commented 3 years ago

But this issue is closed now. Yes, I can still add comments, but it's a bit weird to have a discussion in a closed issue. Maybe it could be moved to Discussions?

Krzmbrzl commented 3 years ago

I don't see a reason for that. The status of the issue does not restrict the discussion in here in any way 🤷

seniorm0ment commented 3 years ago

to be fair, i agree with both points, but closing the issue makes it harder for people to see it and potentially offer suggestions or even contribute to help. i see no reason to close it, it just makes it so less people see it which is probably the opposite of what we want.

seniorm0ment commented 3 years ago

that being said, has any work been considered or started or are we just focusing on other things right now, or still unsure how to implement it? like where are we at with this becoming a viable feature that could potentially be implemented in the near future?

i have suggested just using ffmpeg if that could work, but i know there has been another discussion for xmpp chat implementation previously, maybe it could make more sense to implement xmpp for chat and then also use xmpp for video? just an idea.

Krzmbrzl commented 3 years ago

but closing the issue makes it harder for people to see it and potentially offer suggestions or even contribute to help.

I don't agree with that. With >500 open issues, openness is not a criterion for people to find an issue. Thy will have to search for the topic they are looking for. And (as of now) if you enter "video" in the issue search list, the "Big idea collector" is the first one to show up in which this issue is linked and it is also explained why it is closed.

i see no reason to close it, it just makes it so less people see it which is probably the opposite of what we want.

The reason is also explained in the "Big idea collector": This is one of those feature requests that can't really be implemented unless someone steps up and wants to sink a huge chunk of time and work into it in order to make the necessary changes to the existing code to even enable such a feature to exist. Therefore the request is not actionable for us and is therefore closed. In order to not lose track of it though, it is added to the "Big idea collector".

Thus in short: It helps to avoid essentially dead open issues.

With this I would kindly ask everyone to leave it at this or if there is more need for discussion, open a separate discussion here on GitHub for it since this discussion is OT for the topic of this issue.

Krzmbrzl commented 3 years ago

that being said, has any work been considered or started or are we just focusing on other things right now, or still unsure how to implement it? like where are we at with this becoming a viable feature that could potentially be implemented in the near future?

Nothing has been done nor is anything planned to be done for it afaik.

JobberRT commented 2 years ago

Hey, i don't know if i should or could comment about this idea, cuz i have no commits or contribute to this project(thought tring to write a file browser or file transfer function for mumble), But here's what i thought about this feature, as a user.

Background

Me, a gamer but not a e-sporter with a job, My favorite thing in a day is get off work and have some friends playing some games. But friends are living all over the country, It's important to have a good VoIP software that can handle our real-time, multi-user chatting.

I've tried discord, teamspeak, Skype and other stuff, they have many advantage like web client, file transfer, better noisy canclling or so, but finally i choose Mumble with no other reason: What i need is only a chat software and nothing else

Mumble is a VOIP software and i don't think or HOPE Mumble grows too many ablitiy other than VOICE and here's why

Since last couple years, communicate software are developing with the direction of Voice&Video&Message&File(kind of All In One), and there are too many similar software like mumble, but no one is cleanner than mumble.
From the beginning, mumble doesn't support video and therefore it becomes the most popluar gamer voice-chatting chooice, because all we need is hearing, no seeing.
There are already many Voice&Video software in this world, some of them even has a simple YAML config file or even just a web page using jq or other stuff. It's True that for conference, it's import to have Video. But for VoIP, i hope there is only one V
I think maybe most people like What i need is what i get, when i need more, i go get more rather than Give me everything, when i need something, i will already have it
There is only two software that i know only for voice: Mumble and Teamspeak(After TS5 it will not only for voice).And PLEASE keep mumble what it is.

Mumble or Teamspeak are old, don't have those good looking(mumble is better than teamspeak lol) which base of Electron or something similar. But it's exactly what we need, we don't need more computer resource using by other stuff like UI or non-Voice thing. When we using mumble is to hear people voice, not seeing their face or seeing mumble's UI or anything other stuff.

At last, I have to say,I'm beeing a little bit emotional, but as my own opinion, there is no need adding video for mumble.

Compare to video feature, i think a `file transfer` feature would be more realistic, because no matter what kinds of userr using mumble, they all have a needs for sharing a file. Here's some example:

Gamer: sharing map file, mod file or many other stuff
Worker: Sharing work file like .doc or .ppt
School in-door teaching: Sharing digital homework or submitting homework
And so on.

mumble-voip / mumble