mumble-voip / mumble

Mumble is an open-source, low-latency, high quality voice chat software.
https://www.mumble.info
Other
6.31k stars 1.11k forks source link

PTT Voice Recorder #4570

Open perezreina opened 3 years ago

perezreina commented 3 years ago

Hello! We have used mumble to make an analog / VoIP radio gateway, this part everything perfect. It would be very useful if one more recording option were added to the audio recorder, apart from the existing multichannel and mix, which consists of saving each time a user speaks a new file with the date and time instead of as now the multichannel works that everything that the same user speaks is saved in the same file. Thank you very much in advance.

Krzmbrzl commented 3 years ago

What is the immediate benefit of such an approach vs. taking the recording of a user and splitting it up afterwards? To me it seems like this is a task that could be easily automated with a script and something like ffmpeg and/or audacity :thinking:

perezreina commented 3 years ago

Cheers! Putting the background, I belong to a group of forest defense of Catalonia, we are volunteers who collaborate in the extinction of forest fires and in the conservation and preservation of forests. As I explained in the previous message, we have used mumble to create an analog to VoIP radio gateway and taking advantage of this it would be perfect for it to also record communications. The reason why we believe that this would be a good feature, is because we will record long periods and we may need to search for some recorded communication while it is still being recorded, I hope I have explained myself well, thank you very much for your time and congratulations on the project.

Krzmbrzl commented 3 years ago

Okay I see.

I think though that this is a rather specific need, so I doubt that it'll be integrated by one of us main developers... If someone was to create a PR with the needed changes, we'd probably accept it though :point_up:

One thing that has to be kept in mind for something like this is that the disk IO shouldn't get too much. A minimum time of silence should probably be added before which no new file gets started.

mk-pmb commented 3 years ago

I thought about all the triggers and hooks that would be nice to have in mumble, like writing custom template messages to files in append-write mode, or to unix domain sockets, or to pipes, or launch scripts, or send network messages, … and they would also be useful in a lot of other places for other actions, basically whenever we flip a UI-related boolean. It's a bottomless pit of rarely-used features.

I also have a lot of ideas for what kinds of mumble bots could help with that kind of radio gateway scenario, but I'd need to know more details. And if @perezreina had the programmers to implement them, they'd probably already have built it.

So the most pragmatic solution for the usecase above, with current mumble, to me seems to be to have a 3rd-party recording program react to the same button that mumble uses for PTT. If the people talking use linux, and can accept a fraction of a second delay between PTT and start of recording, this could easily be done with xinput, sox and a little shell script to connect them.

mk-pmb commented 3 years ago

If we do add a self-record feature (not limited to PTT), I suggest we do it in the way that we allow to configure exactly one command to be run as a child process, that receives on its stdin a copy of the original audio that was recorded. This way, we can delegate to users and their tools where to transmit and/or save the audio.

We should use au file format because it allows streaming raw audio samples without prior knowledge of length. (For WAV we'd have to write the duration in the header.)

We'd try and keep that program listening at all times, and even send audio headers as soon as we know them (probably at server connect), so the child program can initialize and will (hopefully) be ready for actual data as soon as the user starts speaking.

For cases where immediate reaction to speech stop is favorable, or it's otherwise unpractical to use time-based detection of speech stop, we should allow selecting a file with user-supplied magic bytes that Mumble shall send as end-of-data. The separator file should usually be small enough that we can cache it in RAM, and read it only on server connect and when a file is selected in options. An empty file shall be accepted, because it's easier for scripts to manage that file content, than fiddle with mumble options. There should also be an option for whether to send the headers again on speech resume, or only once per child process (re)start, or whether we shall close the child's stdin and start a new one on speech stop. Maybe with a warning that the latter may cause lost audio if the child process launch takes too long.

A twin option, independent from the former, would be nice to feed a child process with the compressed audio in exactly the format we use to send it to the server. I'm not sure whether that format already includes a speech stop marker, but we should independently offer the user-defined end-of-data file, so that for just saving it away, they can use simple programs that maybe don't even know about audio, even less our codecs.

mk-pmb commented 3 years ago

Maybe the custom end marker file stuff might be too much new UI for the first version of this. I think it would be acceptable to instead make the end marker be 8 pseudo-samples that encode as consecutive newline characters, and whenever we encounter a half-as-long streak of audio samples that would encode as just newlines, we add +1 to the last of them.

Krzmbrzl commented 3 years ago

I thought about all the triggers and hooks that would be nice to have in mumble, like writing custom template messages to files in append-write mode, or to unix domain sockets, or to pipes, or launch scripts, or send network messages, … and they would also be useful in a lot of other places for other actions, basically whenever we flip a UI-related boolean. It's a bottomless pit of rarely-used features.

I agree. But in the future stuff like this could be implemented as a plugin to Mumble (see #3743). That way some external devs can (more or less) easily implement this niche functionality and share it with everyone interested :point_up:

@mk-pmb I don't answer to the rest of what you have written because I only partially understood it (I don't have a lot of background knowledge in audio processing) and I currently don't have the time needed to do so. It's definitely not because I want to ignore it straight away though ;)

hbeni commented 3 years ago

Another idea would be to make the recordings server side using a lua bot. I do this for my plugin and this works good: https://github.com/hbeni/fgcom-mumble/blob/master/server/fgcom-radio-recorder.bot.lua The bot code need to be adjusted a bit, so it does not depend on the fgcom location/ptt stuff, but basicly it should be enough to tweak the OnUserStartSpeaking, OnUserSpeak and OnUserStopSpeaking lua hooks, so it records to a new file for every user/speak process. And maybe the recorded file format (the currently uses FGCS records raw samples), but the overall idea is there.