vidify / old-audiosync

First implementation of the audio synchronization feature for Vidify, now obsolete
GNU Lesser General Public License v3.0
17 stars 3 forks source link

Suggestion: Use dedicated pulseaudio monitor #22

Closed Nowa-Ammerlaan closed 4 years ago

Nowa-Ammerlaan commented 4 years ago

Hello,

I've been checking out the audiosync feature, and I've gotta say it is absolutely awesome. It works pretty well, though not always on videos with a very long intro/outro. But overall it's pretty impressive how well this works.

I do have one suggestion though. I noticed that you are using 'Monitor of {default device}' to record output. Now I've been experimenting a bit, and something as simple as a Notification Sound or some other audio stream can make the process fail: INFO: Audiosync module failed to return the lag. Also, for non-standard pulseaudio setups (like mine) this requires some manual-pavucontrol-work to get it working. Furthermore, pulseaudio's loopback module can introduce white noise in the default output device which also leads to problems.

[My setup: I use Pulseeffects for fancy effects, and I use the loopback module to forward input from my PC's line-in to line-out (--> speakers). Pulseeffects makes the setup for AudioSync non-trivial, I have to set AudioSync to `Monitor of PulseEffects.Apps. The line-in loopback introduces some white noise that is usually inaudible but breaks audiosync.]

To circumvent all these problems, and make the audiosync process a bit more rigid, I would suggest the following:

1) Create virtual sink and loopback:

pactl load-module module-null-sink sink_name=AudioSync sink_properties=device.description="AudioSync"
pactl load-module module-loopback source=AudioSync.monitor

2) Find spotify (or other supported player): pacmd list-sink-inputs find index of player named spotify (or other supported player), use some grep magic or something on this output (in this case 135):

index: 135
driver: <protocol-native.c>
flags: 
state: RUNNING
sink: 1 <PulseEffects_apps>
volume: front-left: 65536 / 100% / 0.00 dB,   front-right: 65536 / 100% / 0.00 dB
balance 0.00
muted: no
current latency: 405.37 ms
requested latency: 50.00 ms
sample spec: s32le 2ch 44100Hz
channel map: front-left,front-right
Stereo
resample method: speex-float-5
module: 9
client: 99 <ALSA plug-in [spotify]>
properties:
media.name = "ALSA Playback"
application.name = "ALSA plug-in [spotify]"
native-protocol.peer = "UNIX socket client"
native-protocol.version = "33"
application.process.id = "4044"
application.process.user = "andrew"
application.process.host = "andrew-gentoo-pc"
application.process.binary = "spotify"
window.x11.display = ":0"
application.language = "en_GB.UTF-8"
application.process.machine_id = "204b505c963976da92752bb45b5adc9c"
application.process.session_id = "2"
application.icon_name = "spotify-client"
module-stream-restore.id = "sink-input-by-application-name:ALSA plug-in [spotify]"

[EDIT] This grep magic does the trick: echo ${$(pacmd list-sink-inputs | grep -B 20 spotify | grep index)#"index:"} Though we should probably find a way to deal with the case that there are multiple spotify clients playing (unlikely, but we don't want everything to crash if that does happen).

3) Move player to new virtual sink: pacmd move-sink-input 135 AudioSync 4) Vidify-audiosync monitors AudioSync.monitor instead of $DEFAULT_SINK.monitor and does it's thing.

This is more robust, and will work on more complex setups without manual configuration. Because the loopback module creates an audio stream that acts as if it is a regular application. And there will be no more need to manually move audiosync to the correct monitor because the correct monitor will always be AudioSync.monitor. In short, this will make audiosync work 'out-of-the-box'. Furthermore, audiosync will record audio from spotify and only spotify, thus audiosync will work even when another stream or notification sound is playing, or when there is other noise on the default output.

Proof of principle:

Screenshot_20200107_131145 Spotify outputs into AudioSync

Screenshot_20200107_131325 AudioSync.monitor is looped back to the default device and vidify-audiosync records from AudioSync.monitor

Note:

Spotify starts a new stream whenever a new song is played (Actually, this happens whenever the play button is pressed), this means that whenever a new song is started this has to be run again: pacmd move-sink-input 135 AudioSync However the pactl load-module commands only have to be run once, at the start of vidify --audiosync

EDIT: In summary

Untested patch

This should work to set ffmpeg to use AudioSync.monitor instead of ${DEFAULT_SINK}.monitor. And move spotify streams to the AudioSync sink.

diff --git a/dev/getaudio.sh b/dev/getaudio.sh
index 1b691e8..109a475 100644
--- a/dev/getaudio.sh
+++ b/dev/getaudio.sh
@@ -1,13 +1,32 @@
 #!/bin/bash

 FFMPEG_FLAGS="-ac 1 -r 48000"
+VIRT_SINK_NAME="AudioSync"
+STREAMS="spotify"
+
+ # Execute this once
+function setupsink() {
+    # Setup the virtual AudioSync sink
+    if [ ! $(pacmd list-modules | grep null-sink -A 1 | grep $VIRT_SINK_NAME -q)] ; then
+        pactl load-module module-null-sink sink_name=$VIRT_SINK_NAME sink_properties=device.description="$VIRT_SINK_NAME"
+    fi
+    
+    # And setup the loopback from AudioSync.monitor back to default sink
+    if [ ! $(pacmd list-modules | grep loopback -A 1 | grep $VIRT_SINK_NAME -q)] ; then
+        pactl load-module module-loopback source="$VIRT_SINK_NAME".monitor latency_msec=1
+    fi
+}
+
+ # Execute this whenever spotify (re)starts a stream, (whenever play is pressed)
+function movestreams() {
+    # Move spotify streams to output to new virtual AudioSync device
+    streams_to_move=${$(pacmd list-sink-inputs | grep -B 15 ${STREAMS} | grep index)#"index:"}
+    pacmd move-sink-input $streams_to_move $VIRT_SINK_NAME
+}

 function record() {
     echo "Recording audio into $1..."
-    # Getting the default sink with pacmd list-sinks because the -i default
-    # option for ffmpeg sucks.
-    defaultSink=$(pacmd list-sinks | grep '*' | awk '/index/ {print $3}')
-    ffmpeg -f pulse -i "$defaultSink" $FFMPEG_FLAGS "$1"
+    # Find index of virtual sink and set ffmpeg to record from there
+    Sink=${$(pacmd list-sinks | grep -B 1 $VIRT_SINK_NAME | grep index)#"index:"}
+    ffmpeg -f pulse -i "$Sink" $FFMPEG_FLAGS "$1"
 }
marioortizmanero commented 4 years ago

I've been checking out the audiosync feature, and I've gotta say it is absolutely awesome. It works pretty well, though not always on videos with a very long intro/outro. But overall it's pretty impressive how well this works.

Thanks a bunch! It really is awesome. It's very imprecise for songs with noisy intros and outros, but for videos that are just delayed it's perfect. I currently don't really know how to solve this problem for videos with intros to be honest.

The solution you propose was what I was going to do initially, but I would've had to learn how to use the pulseaudio libraries so I left it for the future. The script to obtain the audio data in dev/getaudio.bash was only used in the first versions of the program, when I hadn't implemented the real-time downloading & recording, because I just downloaded the files beforehand and I read them inside the module. It serves as a good "sketch" of what should be happening inside the module, so I just left this after I wasn't using it anymore. But I can't directly use bash in this module because my data has to be in a C array to run the algorithm periodically with it from the main thread.

I don't know how deep you've looked into the code, but I currently get the audio data from ffmpeg with a pipe, in src/ffmpeg_pipe.c. This way, ffmpeg already handles all the conversions and all I have to do is read the pipe from the parent process. The conversions are important because both the downloaded data and the recorded data must be in the same format for the algorithm to work. I currently use 48,000Hz, mono and double-precision floating point frames (double type, in little endian). These are a lot of conversions to have in mind, since the original data can come in any format.

The current solution is of course worse than what you propose: I have to read the data byte by byte (very slow), and I don't have much control over it. I also can't scope specifically the Spotify app like you did, so I have to record the entire desktop. With a pulseaudio module I could also find a more suitable intermediate format that both audio files can be converted to, rather than using only what I mentioned earler. That way I would also avoid doing unnecessary conversions (if both original formats are in 41000Hz, I'd just leave them like that, rather than converting both to 48000Hz).

I appreciate your post because I have an out-of-the-box pulseaudio setup and I don't know how it actually works for more advanced users. I will use your bash sketch when I start improving this part of the code. I'm currently more focused on the multi-threading part, and being able to control ffmpeg from outside the module. That way, if the Spotify music is paused, ffmpeg won't keep recording. Also for a better synchronization between threads overall.

I currently can't work on what you suggest because it's going to take me a lot of time and I currently don't have it. Also, I'm still a student so my C code is really messy and I want to get it right, which takes time. I'm starting finals soon so there won't be much activity in here until I'm done (by mid-February). I'll still be checking the repos regularly for any issues/PRs made, though.

Thanks again for your help!!

Nowa-Ammerlaan commented 4 years ago

The solution you propose was what I was going to do initially, but I would've had to learn how to use the pulseaudio libraries so I left it for the future.

:O I didn't know that you could do pulseaudio things directly from C, that is indeed a lot better (and faster) then my bash script.

I don't know how deep you've looked into the code, but I currently get the audio data from ffmpeg with a pipe, in src/ffmpeg_pipe.c. This way, ffmpeg already handles all the conversions and all I have to do is read the pipe from the parent process. The conversions are important because both the downloaded data and the recorded data must be in the same format for the algorithm to work. I currently use 48,000Hz, mono and double-precision floating point frames (double type, in little endian). These are a lot of conversions to have in mind, since the original data can come in any format.

I didn't go very deep, my C is not nearly as good as my python :(

Something like this could work, if executed (by vidify's python side) every time spotify starts playing. I have added a part that moves the ffmpeg recording stream to the right monitor at the end. It's entirely bash, so no need to adjust any C code. It doesn't touch the actual audio streams, it just moves them around. This script works for me if I execute it just after vidify starts. However the thing is that as soon as I pause the music the spotify stream gets reset :(

# Execute this once
function setupsink() {
    # Setup the virtual AudioSync sink
    if [ ! $(pacmd list-modules | grep null-sink -A 1 | grep $VIRT_SINK_NAME -q)] ; then
        pactl load-module module-null-sink sink_name=$VIRT_SINK_NAME sink_properties=device.description="$VIRT_SINK_NAME"
    fi

    # And setup the loopback from AudioSync.monitor back to default sink
    if [ ! $(pacmd list-modules | grep loopback -A 1 | grep $VIRT_SINK_NAME -q)] ; then
        pactl load-module module-loopback source="$VIRT_SINK_NAME".monitor latency_msec=1
    fi
}

 # Execute this whenever spotify (re)starts a stream, (whenever play is pressed)
function movestreams() {
    # Move spotify streams to output to new virtual AudioSync device
    streams_to_move=${$(pacmd list-sink-inputs | grep -B 15 ${STREAMS} | grep index)#"index:"}
    pacmd move-sink-input $streams_to_move $VIRT_SINK_NAME

   # Move ffmpeg to record from AudioSync.monitor
    ffmpeg_to_move=${$(pacmd list-source-outputs | grep -B 15 Lavf | grep index)#"index:"}
    pacmd move-source-output $ffmpeg_to_move $VIRT_SINK_NAME
}

But as you said, using the libraries would be a lot better.

I currently can't work on what you suggest because it's going to take me a lot of time and I currently don't have it. Also, I'm still a student so my C code is really messy and I want to get it right, which takes time. I'm starting finals soon so there won't be much activity in here until I'm done (by mid-February). I'll still be checking the repos regularly for any issues/PRs made, though.

Thanks again for your help!!

No rush :) , I'm just dumping ideas here and there, otherwise I'll forget them later. I'm a student too, but I study Physics (meaning I basically only know (ba)sh and python), so your C code is bound to be a lot better then mine :P

Good luck with your finals :D

marioortizmanero commented 4 years ago

Something like this could work, if executed (by vidify's python side) every time spotify starts playing. I have added a part that moves the ffmpeg recording stream to the right monitor at the end. It's entirely bash, so no need to adjust any C code. It doesn't touch the actual audio streams, it just moves them around. This script works for me if I execute it just after vidify starts. However the thing is that as soon as I pause the music the spotify stream gets reset :(

Ohhh so I kinda misunderstood your initial comment. You want to run setupsink once at the Vidify initialization, and then movestreams before this module is executed, right? You had modified getaudio.bash in the first comment so I thought you completely wanted to replace the recording part with the bash script. This is much easier then, and can be done from inside the C module (better than from Vidify).

It actually doesn't seem that hard to implement directly in C, so after I have more time I'll first try to do it with the pulseaudio libraries (or ALSA). The hard part is the recording itself because I need to do lots of conversions. But for now, I could just run this script translated to C and afterwards record the audio with ffmpeg (if that's possible). Maybe the issue you have when Spotify is paused can be fixed with this lower level code.

No rush :) , I'm just dumping ideas here and there, otherwise I'll forget them later. I'm a student too, but I study Physics (meaning I basically only know (ba)sh and python), so your C code is bound to be a lot better then mine :P

Great! Keep the ideas coming, I love it. I'll keep reading the repos in case someone opens new issues/PRs. Also, thanks! Good luck to you whenever you start finals, too :)

Nowa-Ammerlaan commented 4 years ago

You want to run setupsink once at the Vidify initialization

Exactly, this only has to be done once, when the virtual sink and loopback are available they will stay available until the system reboots.

and then movestreams before this module is executed, right?

Indeed, as soon as we want to analyse spotify's audio output we should move that audio stream to the virtual sink. Maybe it is even possible to get the name of the player that is playing from vidify/(py)dbus and use that to look up the index of the audio stream that needs moving. That way there is no need to hardcode it to spotify.

Getting ffmpeg to record from the correct monitor is the easy part, just replace 'default' with the monitor of the virtual sink:

diff --git a/src/capture/linux_capture.c b/src/capture/linux_capture.c
index 0a89b94..fe64b59 100644
--- a/src/capture/linux_capture.c
+++ b/src/capture/linux_capture.c
    // Finally starting to record the audio with ffmpeg.
     char *args[] = {
-        "ffmpeg", "-y", "-to", MAX_SECONDS_STR, "-f", "pulse", "-i", "default",
+        "ffmpeg", "-y", "-to", MAX_SECONDS_STR, "-f", "pulse", "-i", "AudioSync.monitor",
         "-ac", NUM_CHANNELS_STR, "-r", SAMPLE_RATE_STR, "-f", "f64le",
         "pipe:1", NULL
     };

It actually doesn't seem that hard to implement directly in C, so after I have more time I'll first try to do it with the pulseaudio libraries (or ALSA). The hard part is the recording itself because I need to do lots of conversions. But for now, I could just run this script translated to C and afterwards record the audio with ffmpeg (if that's possible). Maybe the issue you have when Spotify is paused can be fixed with this lower level code.

That would be awesome, using ALSA would probably be even faster because spotify uses ALSA (all spotify streams are always named "ALSA plug-in [spotify]"). That being said Pulseaudio is probably a lot easier then ALSA.

The hard part is the recording itself because I need to do lots of conversions.

Spotify always outputs audio at 44100 Hz, irrespective of the rate pulseaudio is using (the format is variable though, either s16le or s32le depending on pulseaduio settings). If you use a dedicated monitor you can exploit this and save a bit of CPU. This can be done by hard coding the null sink to match spotify's rate (add the rate=44100option, leave format and channels as is):

# Setup the virtual AudioSync sink
    if [ ! $(pacmd list-modules | grep null-sink -A 1 | grep $VIRT_SINK_NAME -q)] ; then
        pactl load-module module-null-sink sink_name=$VIRT_SINK_NAME sink_properties=device.description="$VIRT_SINK_NAME" rate=44100
    fi

    # And setup the loopback from AudioSync.monitor back to default sink
    if [ ! $(pacmd list-modules | grep loopback -A 1 | grep $VIRT_SINK_NAME -q)] ; then
        pactl load-module module-loopback source="$VIRT_SINK_NAME".monitor latency_msec=1 rate=44100
    fi

This prevents pulseaudio from resampling the spotify stream, and as added benefit you now know for certain what the rate of the input stream from the .monitor virtual device is (44100). Now you only have to resample the stream from youtube to match 44100 (and make sure the formats match, that would be the difficult part). Pulseaudio will handle any resampling from the loopback stream to the default sink (if necessary).

Here's a more concrete schematic example ($DEF_SAMP_RATE is pulseaudio's 'Default Sample Rate' from /etc/pulse/daemon.conf) Currently (without audiosync)

spotify(s32le 44100) --> resampling (s32le $DEF_SAMP_RATE) --> ouput device (s32le $DEF_SAMP_RATE)

Currently (with audiosync)

spotify(s32le 44100) --> resampling (s32le $DEF_SAMP_RATE) --> ouput device (s32le $DEF_SAMP_RATE)
                                                               --> ouput device monitor (s32le $DEF_SAMP_RATE) --> ffmpeg 

With dedicated sink/monitor (without hardcoding rate=44100)

spotify(s32le 44100) --> resampling (s32le $DEF_SAMP_RATE) --> virtual-sink (s32le $DEF_SAMP_RATE) --> loopback (s32le $DEF_SAMP_RATE) --> output device (s32le $DEF_SAMP_RATE)
                                                                 --> virtual sink monitor (s32le $DEF_SAMP_RATE) --> ffmpeg

With dedicated sink/monitor (hardcoding rate=44100)

spotify(s32le 44100) --> virtual-sink (s32le 44100) --> loopback (s32le 44100) --> resampling (s32le 48000) --> output device (s32le $DEF_SAMP_RATE)
                       --> virtual sink monitor (s32le 44100) --> ffmpeg

In the last example the resampling to $DEF_SAMP_RATE only takes place at the very end. This means that you can exploit the fact that one of ffmpeg's inputs is guaranteed to be 44100Hz. And it would also reduce the latency between spotify and ffmpeg a bit because the resampling doesn't take place between spotify and ffmpeg.

In short, i think it would be a good idea to hardcode rate=44100 to the virtual-sink module and loopback module.

Nowa-Ammerlaan commented 4 years ago

I had some spare time just now, and I brushed up my C-skills a bit and got this "'working'" (sort of).

Something like this is very ugly, but it works excellent: pulse-patch.zip

Just before the audiosync module starts recoding, this will:

This makes the audiosync feature work pretty well inside my non-trivial pulseaudio setup :smile: . However, as I said, my C-skills are very rusty, and my patch is very ugly (sorry) (it's mostly bash code inside C :disappointed: ). But maybe it will be useful as a starting point.

[EDIT] Here's a log:

Virtual sink module already loaded
Loopback module already loaded
Moving stream 76 to AudioSync

audiosync: obtained youtube-dl URL
audiosync: Next interval (0): cap=165460 down=144011
audiosync: 3327 frames of delay with a confidence of 0.994927
audiosync: read ABORT_ST, quitting...
audiosync: read ABORT_ST, quitting...
marioortizmanero commented 4 years ago

Hey Andrew, I'm back.

Thanks a lot for your contributions. The script is super useful because I'm not too experienced with PulseAudio and elaborate set-ups, so it'll serve as a great guide when I translate it to C. Your solution seems to work well, but I really would like to take more time to do it correctly with the ALSA or PulseAudio libraries.

Your first comment about hardcoding rate=44100 cleared up a bit how I should do it, so thanks for that, too. I'll have to investigate more and I'll come back to this soon!

marioortizmanero commented 4 years ago

So I've started to work on this a bit in the pulseaudio branch and here are a couple things I've noticed:

Nowa-Ammerlaan commented 4 years ago

So I've started to work on this a bit in the pulseaudio branch and here are a couple things I've noticed:

On my computer I don't have to move the sink for every song? Not sure if I'm doing it incorrectly, but calling pacmd move-sink-input works until spotify is closed. I'd still have to run it always to avoid errors when spotify is restarted.

Interesting, i have the same behaviour on my PC when I tried it just now. One thing I noticed is that the spotify stream is now named "Spotify" instead of "Alsa Plugin [Spotify]" (see screenshot in my first post). Maybe spotify got updated or something else changed in my setup. In any case spotify seems to be using pulseaudio directly now, which makes the streams behave more nicely. (Which is good, I'm just confused about what was going on before with the "Alsa Plugin [Spotify]" stuff)

In any case pacmd list-sink-inputs lists for each stream which sink(/device) it is sending audio to:

andrew@andrew-gentoo-pc ~ % pacmd list-sink-inputs | grep spotify -A 15 -B 25 | grep sink
sink: 4 <AudioSync>
module-stream-restore.id = "sink-input-by-media-role:music"

Using this we could create some if statement to only move the sink if it is not already on the correct sink. Because it should still be moved once after spotify has started. (and if spotify is restarted)

Executing such an if statement whenever a (new) song starts should make sure that audiosync keeps working at all times.

The global audio is actually delayed in comparison to the created sink... ? Only by like half a second. Not sure if what's delayed is the global audio sink, or the new one.

I hadn't noticed this before. In any case the latency is small on my PC (it fluctuates between 0.1ms and 2 ms): Screenshot_20200214_141121

Please note that the loopback module's default latency is 200ms, this is why I specified latency_msec=1 in my earlier posts. I haven't checked your code in depth, but in case you haven't specified the latency of the loopback module, I'd recommend you do that.

As a side note: The specified latency is not guaranteed to be the actual latency (see screenshot: set lattency is 1ms, yet actual is 2ms). The actual lower limit of the latency is determined by the hardware (in my case apparently 2ms). Setting it to something higher will make pulseaudio (artificially) increase the latency to match the set latency. Setting the latency lower then what your hardware can do has no effect, pulseaudio will just use the lowest latency possible. As far as I know, this has no negative side effects, other then maybe using slightly more CPU.

As another side note: The main use of the loopback module is looping back microphone/Line-In inputs back to speakers/headphones. In such a use case you do want latency, to prevent a feedback loop. I suspect this is the reason that the default latency is 200ms. In our case there is no risk of a feedback loop so we can just set the latency as low as possible. (Which I think is 1ms, I'm not sure what happens if you set it to 0ms)

[EDIT]: I took a more in depth look at the code and noticed:

pactl load-module module-loopback source="$SINKNAME.monitor"  # latency_msec=1 rate=44100

The rate is indeed optional (though it should in theory speed things up) but without specifying latency_msec=1 you will get latency_msec=200 which is the default. I suspect this is what is causing the delay you are observing.

There is a small but noticeable cut in the audio when moving the sinks. Not sure if that would annoy some... It doesn't always happen, though.

I don't get this on my PC (or at least I don't hear it), I can switch the spotify stream between virtual sinks more or less seamlessly without hearing any kind of interruption. I suspect that what you are observing is a symptom of the half a second delay you noticed in the previous point. Eliminating the delay (as much as possible) should stop (or at least reduce) this.

In any case, since it doesn't seem necessary any more to switch the stream at the beginning of every song. I don't think it will be that annoying.

Only one of these virtual sinks can be working at once: not a problem, but good to know.

You can create as many virtual sinks/loopbacks as you want. But every stream can only output into 1 sink. If you'd like to have a stream output to multiple sinks, you can use the 'module-combine-sink' module. It is a virtual sink that sends it's input to another sink, and a 'slave sink'.

marioortizmanero commented 4 years ago

Interesting, i have the same behaviour on my PC when I tried it just now. One thing I noticed is that the spotify stream is now named "Spotify" instead of "Alsa Plugin [Spotify]" (see screenshot in my first post). Maybe spotify got updated or something else changed in my setup. In any case spotify seems to be using pulseaudio directly now, which makes the streams behave more nicely. (Which is good, I'm just confused about what was going on before with the "Alsa Plugin [Spotify]" stuff)

I should investigate more on this, but I'm not sure where exactly their changelog can be found. I use a rolling-release distro, so I'm not sure at what version this happened. If it wasn't too long ago it's possible that those with older versions will experience issues.

Anyway, I don't think checking the current sinks whenever a new song starts will worsen the performance. Using a pipe like right now is already quite slow, I imagine.

Please note that the loopback module's default latency is 200ms, this is why I specified latency_msec=1 in my earlier posts. I haven't checked your code in depth, but in case you haven't specified the latency of the loopback module, I'd recommend you do that.

Ah yes. I commented it out because I wanted to use the default config before anything else. So thanks for explaining it more clearly. I'll use the latency_msec option as 1 for now. If I use 0, this is what happens:

❯ pactl load-module module-loopback source="$SINKNAME.monitor" latency_msec=0
Failure: Module initialization failed

I'll have to investigate a bit more about using 44100Hz as the sample rate, though. Many other parts of the program use 4.8kHz. Also, I'm not sure if it will worsen the audiosync's precision.

I don't get this on my PC (or at least I don't hear it), I can switch the spotify stream between virtual sinks more or less seamlessly without hearing any kind of interruption. I suspect that what you are observing is a symptom of the half a second delay you noticed in the previous point. Eliminating the delay (as much as possible) should stop (or at least reduce) this.

Yes, this is probably what you suggested. I haven't heard it again.

You can create as many virtual sinks/loopbacks as you want. But every stream can only output into 1 sink. If you'd like to have a stream output to multiple sinks, you can use the 'module-combine-sink' module. It is a virtual sink that sends it's input to another sink, and a 'slave sink'.

Thanks! For now I don't think it's needed, but it's good to know.

marioortizmanero commented 4 years ago

Update: the pulseaudio libs are almost unusable... Terrible docs, no real examples, complex compilation... So I'll try to come up with the best solution I can find to use the CLI if I end up being unable to use C...

There seems to be a big difference between the pulseaudio API needed to run CLI commands, and the simple API to record sinks ans such. The former is way too low level and undocumented, but I still have hope for the latter.

Nowa-Ammerlaan commented 4 years ago

I should investigate more on this, but I'm not sure where exactly their changelog can be found. I use a rolling-release distro, so I'm not sure at what version this happened. If it wasn't too long ago it's possible that those with older versions will experience issues.

I'm on spotify-1.1.10-r1 at the moment, the git logs show that this version was added 2019-08-08. So I don't think any spotify update is the cause of this change after all. Perhaps it was something else. In any case it would be a good idea to check whether the spotify stream is still set to the correct sink whenever audiosync starts recording anyway.

I'll have to investigate a bit more about using 44100Hz as the sample rate, though. Many other parts of the program use 4.8kHz. Also, I'm not sure if it will worsen the audiosync's precision.

I did some more tests, and it doesn't really seem to matter. This is without rate=44100: Screenshot_20200215_092646

And this is with rate=44100 Screenshot_20200215_093807

The main difference is that the "Resampler" of the spotify stream changes to "copy" instead of "speex-float-5" (which is the default resample-method on my system). This is to be expected, in fact that was the whole point of the rate=44100 in the first place. However, the latency doesn't really change significantly, it seems slightly better with rate=44100. However it fluctuates a lot, and any improvement is overshadowed by this fluctuation, so I dare not draw any hard conclusions. That being said, in theory rate=44100 should still reduce latency, and I would expect to observe a larger change on low-end hardware. Though maybe not as larger as I initially expected.

Also, this whole rate=44100 story is only relevant for the spotify client, if you add support for other clients in the future you might not want rate=44100 after all. Because as far as I know spotify is the only client that uses a rate of 44100Hz all the time. Other (music)-clients might use 48000Hz if available, though I have no examples at the moment.

[EDIT] If you'd like to test this yourself: https://github.com/wwmm/pulseeffects It's main use is applying fancy effects to audio streams, but it works as a 'stream/sinks properties'-analyser as well. And it's a bit nicer then using watch "pacmd list-sink-inputs | grep spotify -B 15 -A 15"

Update: the pulseaudio libs are almost unusable... Terrible docs, no real examples, complex compilation... So I'll try to come up with the best solution I can find to use the CLI if I end up being unable to use C...

If doing this in C ends up to be too much of a nightmare, you might be interested in https://pypi.org/project/pulsectl . I haven't explored the code here in great detail, but this might just be able to do what needs to be done in python. The advantage would be that we could use the 'Power of Python' (with a capital P because python is awesome :D ), instead of having to use these 'hacky' grep commands. (I have this feeling that using grep might not be a very future-proof solution, because it might break if pactl/pacmd changes the way it outputs things.)

marioortizmanero commented 4 years ago

I'm on spotify-1.1.10-r1 at the moment, the git logs show that this version was added 2019-08-08. So I don't think any spotify update is the cause of this change after all. Perhaps it was something else. In any case it would be a good idea to check whether the spotify stream is still set to the correct sink whenever audiosync starts recording anyway.

Maybe it was a pulseaudio update? Did you try the scripts on something other than spotify?

Also, this whole rate=44100 story is only relevant for the spotify client, if you add support for other clients in the future you might not want rate=44100 after all. Because as far as I know spotify is the only client that uses a rate of 44100Hz all the time. Other (music)-clients might use 48000Hz if available, though I have no examples at the moment.

Yeah, the thing is that other media players are also supported, so it can't be limited to Spotify. Using 44100Hz is still interesting, though. It might save up memory and CPU in some cases. But I'll leave it for the future (#15).

If doing this in C ends up to be too much of a nightmare, you might be interested in https://pypi.org/project/pulsectl . I haven't explored the code here in great detail, but this might just be able to do what needs to be done in python. The advantage would be that we could use the 'Power of Python' (with a capital P because python is awesome :D ), instead of having to use these 'hacky' grep commands. (I have this feeling that using grep might not be a very future-proof solution, because it might break if pactl/pacmd changes the way it outputs things.)

Thanks for the recommendation! I actually decided to give it another try today and I did get it working! Here's a program you can test for now. The docs are better than I thought, once I understood how pulseaudio handles requests and such.

Nowa-Ammerlaan commented 4 years ago

Maybe it was a pulseaudio update? Did you try the scripts on something other than spotify?

Latest pulseaudio update was 2019-09-17 (to 13.0), So I don't think that was it either. I vaguely remember switching around default_samplerate=96000 and alternate_samplerate=48000, but I'm not sure.

Thanks for the recommendation! I actually decided to give it another try today and I did get it working! Here's a program you can test for now. The docs are better than I thought, once I understood how pulseaudio handles requests and such.

Awesome, I just tested it, it works well :)

One thing I did notice is that if you run ./pulse_setup Spotify a second time, audio will get distorted (slightly echo-ish and very flat). If you run it a third time the distortion gets worse, etc. I think this is because executing the command multiple times, creates multiple sinks/loopbacks. It's not really a problem though if this script is guaranteed to not run if the modules already exist. It might become problematic though when you want to switch to a different media player, e.g first ./pulse_setup Spotify and then ./pulse_setup otherplayer will get you two audiosync sinks/loopbacks and will result in the audio distortion that I observed. One way to fix it would be to check if the modules are already loaded before hand. Or maybe the pulseaudio C functions already have a nice option for this (e.g an option for the load-module function, that makes it only do it's thing if it a module with the same name doesn't already exist). Another way to fix it would be to name the streams stream specific, e.g name the modules "audiosync-${stream_name}" or "audiosync-${stream_id}.

Another thing I found is that I now seem to be experiencing the half-a-second-delay you described earlier when executing ./pulse_setup Spotify (slightly annoying, but it's only once so I suppose I could live with it). Furthermore, if I switch the spotify stream's sink from audiosync to default and back using pavucontrol I find that is no longer seamless (initially). There is a noticeable 'hiccup' in the playback. I checked 'pacmd list-modules' and the loopback module is correctly loaded with the latency_msec=1 option. So I'm not sure what is causing this.

One thing I did notice is that just after running ./pulse_setup Spotify the size of the buffer of the newly created loopback module is huge (~1000ms). After a while it comes back down to sane values (~7.5ms) and I can once again switch the spotify stream seamlessly between the default and audiosync sink. I strongly suspect that this is related, because it kinda makes sense that there is a half-a-second-delay if this huge buffer has to be filled. I'm not sure what to do about it though (and why this occurs in C but not in bash), but this might be helpful: https://www.freedesktop.org/wiki/Software/PulseAudio/Documentation/Developer/Clients/LatencyControl/

[EDIT]:

index: 52
name: <module-null-sink>
argument: <sink_name=audiosync sink_properties=device.description=audiosync>
used: 2
load once: no
properties:
module.author = "Lennart Poettering"
module.description = "Clocked NULL sink"
module.version = "13.0"
index: 53
name: <module-loopback>
argument: <source=audiosync.monitor latency_msec=1>
used: -1
load once: no
properties:
module.author = "Pierre-Louis Bossart"
module.description = "Loopback from source to sink"
module.version = "13.0"

Maybe the 'load once' property might provide an easy way to prevent the modules from being loaded a second time. Though I can not find on google how it should be set.

marioortizmanero commented 4 years ago

One thing I did notice is that if you run ./pulse_setup Spotify a second time, audio will get distorted (slightly echo-ish and very flat). If you run it a third time the distortion gets worse, etc. I think this is because executing the command multiple times, creates multiple sinks/loopbacks. It's not really a problem though if this script is guaranteed to not run if the modules already exist. It might become problematic though when you want to switch to a different media player, e.g first ./pulse_setup Spotify and then ./pulse_setup otherplayer will get you two audiosync sinks/loopbacks and will result in the audio distortion that I observed.

Yes I'm aware. For now it should only be used once, because currently it doesn't check for previous sinks. It's just a sketch of how it should look like, after all. The thing is that I'm still thinking about how I could integrate it into the main program. First of all, the Vidify usage of audiosync would have to be modified to also pass as a parameter the PulseAudio application.name field. This can be tricky because audiosync is only available for Linux MPRIS Players and I don't want to over-complicate the Vidify code too much.

The new version would have to run this set-up either as a standalone function (something like audiosync.setup()) that should only be called once, or maybe do it by default when audiosync.run() is first launched.

Also, now that you mentioned it, Vidify doesn't currently support multiple MPRIS players in the same session, so technically, if you were to use a different player, it wouldn't be detected. I just opened a new issue to track that: https://github.com/vidify/vidify/issues/59.

One way to fix it would be to check if the modules are already loaded before hand. Or maybe the pulseaudio C functions already have a nice option for this (e.g an option for the load-module function, that makes it only do it's thing if it a module with the same name doesn't already exist). Another way to fix it would be to name the streams stream specific, e.g name the modules "audiosync-${stream_name}" or "audiosync-${stream_id}.

This unfortunately doesn't exist AFAIK, but the second idea is definitely a good option to consider.

Another thing I found is that I now seem to be experiencing the half-a-second-delay you described earlier when executing ./pulse_setup Spotify (slightly annoying, but it's only once so I suppose I could live with it). Furthermore, if I switch the spotify stream's sink from audiosync to default and back using pavucontrol I find that is no longer seamless (initially). There is a noticeable 'hiccup' in the playback. I checked 'pacmd list-modules' and the loopback module is correctly loaded with the latency_msec=1 option. So I'm not sure what is causing this.

Same! For some reason, the C program seems to cause this lag, but the script doesn't? Not sure why exactly this happens. It only happens the first time, too. Thanks for the LatencyControl link you provided, I'll read it in detail soon. I should also take note of the differences between what the source code of pacmd does in comparison to this program. There's probably lots to be improved, since I just learned how the libraries work. I'll be back with more info.

marioortizmanero commented 4 years ago

Update:

Nowa-Ammerlaan commented 4 years ago

format=float32le should be used when creating the new sink. That way, reading its data doesn't require a type conversion. channels=1 would simplify it, too. But this would be noticeable, since it would modify the output to be mono. The format shouldn't make a difference, though.

Doesn't that require conversion if your system's format is not float32? e.g. on my system it is set to s32le, and if I remember correctly the default format is s16le. Or is this a ffmpeg/fftw thing?

The page you linked seems to be for applications that produce or record a stream. In this case, the latency is already set to the loopback with latency_ms=1. Not sure what you meant with One thing I did notice is that just after running ./pulse_setup Spotify the size of the buffer of the newly created loopback module is huge (~1000ms).

What I meant is the number that is shown left of 'latency' in my screenshots. I'm not sure how 'buffer' relates to 'latency' but I do know that more buffer means more initial audio delay. The sinks/streams buffer is filled before the audio is sent through, this explains the slight delay we observe when switching audio sinks. The size of the buffer seems variable, just after running your script it is huge, and then it starts decreasing to more reasonable values. It might be possible to force the buffer to smaller values at all times, but I don't know how to do that.

marioortizmanero commented 4 years ago

Doesn't that require conversion if your system's format is not float32? e.g. on my system it is set to s32le, and if I remember correctly the default format is s16le. Or is this a ffmpeg/fftw thing?

Yes, it still requires a conversion, but for what I understand, it's already handled by pulseaudio... Which is probably easier & faster to handle? To be honest I'm not really sure.

There is still a conversion being performed: s16le -> f32le -> f64le. But I guess the second one is easier and faster to perform. But I don't have a reliable source, I should look it up.

What I meant is the number that is shown left of 'latency' in my screenshots. I'm not sure how 'buffer' relates to 'latency' but I do know that more buffer means more initial audio delay. The sinks/streams buffer is filled before the audio is sent through, this explains the slight delay we observe when switching audio sinks. The size of the buffer seems variable, just after running your script it is huge, and then it starts decreasing to more reasonable values. It might be possible to force the buffer to smaller values at all times, but I don't know how to do that.

I don't use PulseEffects myself, but the latency can also be checked by listing the inputs with pacmd. And in my case, I can't reproduce it consistently. It just happens sometimes on either of them.

As I was developing this, I've come to the conclusion that this method won't always work. There are so many possible stream names, and they're so unreliable, that the previous method should still be available in case there was an error while trying to set-up the sink.

So by default, audiosync will have default as the recorded audiosync monitor. If audiosync.setup() was called and it ran successfully, it will then use audiosync.monitor.

Nowa-Ammerlaan commented 4 years ago

I don't use PulseEffects myself, but the latency can also be checked by listing the inputs with pacmd. And in my case, I can't reproduce it consistently. It just happens sometimes on either of them.

I just checked the output of pacmd list-sinks and it only lists device.buffering.buffer_size for physical sinks (which is a constant). So, I'm now a bit confused about what that 'buffer' number is in pulseeffects actually represents. It does have an effect on this delay we observe when switching sinks, when 'buffer' is large, the delay is very noticeable, when it drops down to a couple of milliseconds it is more or less seamless. There is definitely correlation, so for this use case it should be as small as possible. Yet because now I'm not sure what it represents, I'm completely lost on how to control it.

Perhaps, this is just a unavoidable pulseaudio thing :(

As I was developing this, I've come to the conclusion that this method won't always work. There are so many possible stream names, and they're so unreliable, that the previous method should still be available in case there was an error while trying to set-up the sink.

So by default, audiosync will have default as the recorded audiosync monitor. If audiosync.setup() was called and it ran successfully, it will then use audiosync.monitor.

That seems like a good fail-safe. Another idea I just thought of is to determine the index of the stream we want to move not by it's name: I'm not sure if this is possible at all, but maybe MPRIS/dbus can return the index of the stream we want. After all, it gives us information on title/artist/etc from the player, it might also be able to return the index of the stream that this metadata belongs to.

I couldn't find anything about this in the MPRIS specifications though, so maybe it is not possible to determine the index like this :(

marioortizmanero commented 4 years ago

I just checked the output of pacmd list-sinks and it only lists device.buffering.buffer_size for physical sinks (which is a constant). So, I'm now a bit confused about what that 'buffer' number is in pulseeffects actually represents. It does have an effect on this delay we observe when switching sinks, when 'buffer' is large, the delay is very noticeable, when it drops down to a couple of milliseconds it is more or less seamless. There is definitely correlation, so for this use case it should be as small as possible. Yet because now I'm not sure what it represents, I'm completely lost on how to control it.

I'll have to ask in their mailing lists for advice because I'm clueless. I'll update when/if they answer.

That seems like a good fail-safe. Another idea I just thought of is to determine the index of the stream we want to move not by it's name: I'm not sure if this is possible at all, but maybe MPRIS/dbus can return the index of the stream we want. After all, it gives us information on title/artist/etc from the player, it might also be able to return the index of the stream that this metadata belongs to. I couldn't find anything about this in the MPRIS specifications though, so maybe it is not possible to determine the index like this :(

Yes, that's what I was going to do. I'd always call audiosync.setup() inside Vidify with the MPRIS name. But I doubt that the DBus' MPRIS name is always going to be the same as the PulseAudio stream name, so it won't always work. I could also make a list of known exceptions of stream names depending on the provided MPRIS name. That way, even if they weren't the same, it would check the list and try to use the known stream name.


Update: the setup sketch can now be run more than once! Here's a quick benchmark:

❯ time bash pulseaudio_setup_sketch.bash
Loading virtual sink module as 24
Loading loopback module as 25
Moving stream 0 to audiosync
bash pulseaudio_setup_sketch.bash  0.03s user 0.01s system 99% cpu 0.040 total
❯ time ./pulse_setup "Spotify"
audiosync: no custom monitor found, creating a new one
audiosync: new sink created, loading loopback
audiosync: looking for the provided stream
audiosync: moving the found stream
audiosync: setup complete
./pulse_setup "Spotify"  0.01s user 0.00s system 55% cpu 0.017 total

Not only C is faster, it's fully integrated in the code, and more customizable. Although it's way more complex. Soon I will remove the C sketch and turn it into a test. The bash script is enough as a sketch.

marioortizmanero commented 4 years ago

This is pretty much finished. All it needs is a couple tweaks and fixes, but that can be done in master. I still ended up using ffmpeg to read the sink's audio because it's much simpler, and downloading the audio still uses ffmpeg, so it's not like the dependency could be dropped.

I should also improve the documentation.

I still have to update Vidify to support this new version, too, but I'll open up a new issue for that.