mediar-ai / screenpipe

rewind.ai x cursor.com = your AI assistant that has all the context. 24/7 screen & voice recording for the age of super intelligence. get your data ready or be left behind
https://screenpi.pe
MIT License
9.96k stars 593 forks source link

[bug] Screenpipe only records audio input on linux #450

Open twilwa opened 1 month ago

twilwa commented 1 month ago

describe the bug a clear and concise description of what the bug is. Screenpipe will capture audio, but no matter the source (pipe, jack, pulse, or any device option), the captured audio will always be microphone input rather than loopback device audio.

Secondarily, 'save and restart' in the Settings menu only closes the application and doesn't relaunch (possibly because built from source, but thought i'd mention)

to reproduce steps to reproduce the behavior: Enable any number of audio sources labeled (output) or (input) (tested by removing all input or output sources, same behavior)

expected behavior a clear and concise description of what you expected to happen. screenpipe creates two files, input and output, one records loopback and one microphone

screenshots if applicable, add screenshots to help explain your problem.

system information:

(rtcqs) anon@pop-os:~/repos/rtcqs$ fastfetch
             /////////////                 anon@pop-os
         /////////////////////             -----------
      ///////*767////////////////          OS: Pop!_OS jammy 22.04 x86_64
    //////7676767676*//////////////        Kernel: Linux 6.10.4-061004-generic
   /////76767//7676767//////////////       Uptime: 2 hours, 52 mins
  /////767676///*76767///////////////      Packages: 2843 (dpkg), 45 (flatpak-user), 9 (snap)
 ///////767676///76767.///7676*///////     Shell: bash 5.1.16
/////////767676//76767///767676////////    Display (CB242Y): 1920x1080 @ 60 Hz in 24" [External] *
//////////76767676767////76767/////////    Display (VZ239): 1920x1080 @ 60 Hz in 23" [External]
///////////76767676//////7676//////////    DE: GNOME 42.9
////////////,7676,///////767///////////    WM: Mutter (X11)
/////////////*7676///////76////////////    WM Theme: Pop-dark
///////////////7676////////////////////    Theme: Pop-dark [GTK2/3/4]
 ///////////////7676///767////////////     Icons: Pop [GTK2/3/4]
  //////////////////////'////////////      Font: Fira Sans (10pt, SemiLight) [GTK2/3/4]
   //////.7676767676767676767,//////       Cursor: Pop (24px)
    /////767676767676767676767/////        Terminal: zellij 0.40.1
      ///////////////////////////          CPU: AMD Ryzen 5 3600X (12) @ 4.76 GHz
         /////////////////////             GPU: AMD Radeon RX 580 Series [Discrete]
             /////////////                 Memory: 12.70 GiB / 31.27 GiB (41%)
                                           Swap: 512.00 KiB / 20.00 GiB (0%)

additional context rtcqs output (i beleive most of these are Pop! os defaults. I know Pop! also uses pipewire-pulse as it's primary audio driver. I looked into troubleshooting a bit and found this: https://www.reddit.com/r/pop_os/comments/13yjaky/improve_jackpipewire_performance_on_pop_os/

and ran rtcqs before implementing any changes:

 + pysimplegui==4.60.5.0
 + rtcqs==0.6.2
(rtcqs) anon@pop-os:~/repos/rtcqs$ rtcqs
rtcqs - version 0.6.2

Root User
=========
[ OK ] Not running as root.

Group Limits
============
[ WARNING ] User anon is currently not member of a group that has sufficient rtprio (0) and memlock (4197437440) set. Add yourself to a group with sufficent limits set, i.e. audio or rea

CPU Frequency Scaling
=====================
[ OK ] The scaling governor of all CPUs is set to performance.

Kernel Configuration
====================
[ OK ] Valid kernel configuration found.

High Resolution Timers
======================
[ OK ] High resolution timers are enabled.

Tickless Kernel
===============
[ OK ] System is using a tickless kernel.

Preempt RT
==========
[ WARNING ] Kernel 6.10.4-061004-generic without 'threadirqs' parameter or real-time capabilities found. See also https://wiki.linuxaudio.org/wiki/system_configuration#do_i_really_need_a

Spectre/Meltdown Mitigations
============================
[ WARNING ] Kernel with Spectre/Meltdown mitigations found. This could have a negative impact on the performance of your system. See also https://wiki.linuxaudio.org/wiki/system_configuration#disabling_spectre_and_meltdown_mitigations

RT Priorities
=============
[ WARNING ] Could not assign a 80 rtprio SCHED_FIFO value due to the following error: [Errno 1] Operation not permitted. Set up imits.conf. See also https://wiki.linuxaudio.org/wiki/system_configuration#limitsconfaudioconf

Swappiness
==========
[ WARNING ] vm.swappiness is set to 180 which is too high. Set swappiness to a lower value by adding 'vm.swappiness=10' to /etc/sysctl.conf and run 'sysctl --system'. See also https://wiki.linuxaudio.org/wiki/system_configuration#sysctlconf

Filesystems
===========
[ OK ] The following mounts can be used for audio purposes: /

IRQs
====
[ OK ] USB port xhci_hcd with IRQ 63 does not share its IRQ.
[ OK ] Soundcard snd_hda_intel:card2 with IRQ 88 does not share its IRQ.
[ OK ] Soundcard snd_hda_intel:card0 with IRQ 86 does not share its IRQ.
[ OK ] USB port xhci_hcd with IRQ 72 does not share its IRQ.

Power Management
================
[ WARNING ] Power management can't be controlled from user space, the device node /dev/cpu_dma_latency can't be accessed by your user. This prohibits DAWs like Ardour and Reaper to set CPU DMA latency which could help prevent xruns. For enabling access see https://wiki.linuxaudio.org/wiki/system_configuration#quality_of_service_interface

My experience with audio on linux has been 'finicky at best', having attempted to rig whisper.cpp in WSL once or twice and I think barely-succeeded, but it's been a while since I've been on my windows partition. As such, let me know if there's any particular audio device information that would be helpful for setting up sane linux defaults. I can test this on my Ubuntu machine later as well if that's helpful.

linear[bot] commented 1 month ago

MED-167 [bug] Screenpipe only records audio output on linux

louis030195 commented 1 month ago

@twilwa can you try to do

screenpipe --list-audio-device and then

play some video with voice and then

screenpipe --audio-device <the name of it> --disable-vision just to confirm audio output does not work

twilwa commented 1 month ago

Maybe a new issue needed, or perhaps expected behavior -- when building from source, screenpipe isn't located in /usr/local/bin or added to $PATH. I can run the binary okay from the installation location, just as a heads up. Audio devices output:

anon@pop-os:~/repos/screenpipe/target/release$ ./screenpipe --list-audio-devices
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dsnoop.c:540:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dsnoop.c:540:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
ALSA lib pcm_dsnoop.c:540:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dsnoop.c:540:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dsnoop.c:540:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
ALSA lib pcm_dsnoop.c:540:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dsnoop.c:540:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
ALSA lib pcm_dsnoop.c:540:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
ALSA lib pcm_dsnoop.c:540:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
ALSA lib pcm_dsnoop.c:540:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dsnoop.c:540:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
ALSA lib pcm_dsnoop.c:540:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dsnoop.c:540:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dmix.c:972:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
ALSA lib pcm_dsnoop.c:540:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
ALSA lib pcm_dsnoop.c:540:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dsnoop.c:540:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream
available audio devices:
  jack (input)
  pipewire (input)
  pulse (input)
  default (input)
  hw:CARD=Device,DEV=0 (input)
  plughw:CARD=Device,DEV=0 (input)
  sysdefault:CARD=Device (input)
  front:CARD=Device,DEV=0 (input)
  surround40:CARD=Device,DEV=0 (input)
  iec958:CARD=Device,DEV=0 (input)
  dsnoop:CARD=Device,DEV=0 (input)
  hw:CARD=Generic,DEV=0 (input)
  hw:CARD=Generic,DEV=2 (input)
  plughw:CARD=Generic,DEV=0 (input)
  plughw:CARD=Generic,DEV=2 (input)
  sysdefault:CARD=Generic (input)
  front:CARD=Generic,DEV=0 (input)
  surround40:CARD=Generic,DEV=0 (input)
  surround51:CARD=Generic,DEV=0 (input)
  surround71:CARD=Generic,DEV=0 (input)
  dsnoop:CARD=Generic,DEV=0 (input)
  dsnoop:CARD=Generic,DEV=2 (input)
  hw:CARD=V11,DEV=0 (input)
  plughw:CARD=V11,DEV=0 (input)
  sysdefault:CARD=V11 (input)
  front:CARD=V11,DEV=0 (input)
  dsnoop:CARD=V11,DEV=0 (input)
  jack (output)
  pipewire (output)
  pulse (output)
  default (output)
  hw:CARD=HDMI,DEV=3 (output)
  hw:CARD=HDMI,DEV=7 (output)
  hw:CARD=HDMI,DEV=8 (output)
  hw:CARD=HDMI,DEV=9 (output)
  hw:CARD=HDMI,DEV=10 (output)
  hw:CARD=HDMI,DEV=11 (output)
  plughw:CARD=HDMI,DEV=3 (output)
  plughw:CARD=HDMI,DEV=7 (output)
  plughw:CARD=HDMI,DEV=8 (output)
  plughw:CARD=HDMI,DEV=9 (output)
  plughw:CARD=HDMI,DEV=10 (output)
  plughw:CARD=HDMI,DEV=11 (output)
  hdmi:CARD=HDMI,DEV=0 (output)
  hdmi:CARD=HDMI,DEV=1 (output)
  hdmi:CARD=HDMI,DEV=2 (output)
  hdmi:CARD=HDMI,DEV=3 (output)
  hdmi:CARD=HDMI,DEV=4 (output)
  hdmi:CARD=HDMI,DEV=5 (output)
  dmix:CARD=HDMI,DEV=3 (output)
  dmix:CARD=HDMI,DEV=7 (output)
  dmix:CARD=HDMI,DEV=8 (output)
  dmix:CARD=HDMI,DEV=9 (output)
  dmix:CARD=HDMI,DEV=10 (output)
  dmix:CARD=HDMI,DEV=11 (output)
  hw:CARD=Device,DEV=0 (output)
  plughw:CARD=Device,DEV=0 (output)
  sysdefault:CARD=Device (output)
  front:CARD=Device,DEV=0 (output)
  surround40:CARD=Device,DEV=0 (output)
  iec958:CARD=Device,DEV=0 (output)
  dmix:CARD=Device,DEV=0 (output)
  hw:CARD=Generic,DEV=0 (output)
  hw:CARD=Generic,DEV=1 (output)
  plughw:CARD=Generic,DEV=0 (output)
  plughw:CARD=Generic,DEV=1 (output)
  sysdefault:CARD=Generic (output)
  front:CARD=Generic,DEV=0 (output)
  surround40:CARD=Generic,DEV=0 (output)
  surround51:CARD=Generic,DEV=0 (output)
  surround71:CARD=Generic,DEV=0 (output)
  iec958:CARD=Generic,DEV=0 (output)
  dmix:CARD=Generic,DEV=0 (output)
  dmix:CARD=Generic,DEV=1 (output)
  surround21:CARD=Device,DEV=0 (output)
  surround41:CARD=Device,DEV=0 (output)
  surround50:CARD=Device,DEV=0 (output)
  surround51:CARD=Device,DEV=0 (output)
  surround71:CARD=Device,DEV=0 (output)

The same behavior occurs when selecting device 'pipewire (output)' or 'pipewire (input)' -- they both capture mic input, not loopback audio. Any particular device you'd be keen to test?

louis030195 commented 1 month ago

@twilwa

indeed for path: #421

can you try to change this part of the code

https://github.com/mediar-ai/screenpipe/blob/77d56a1b43e144ec528c4fdb085abb382d58c057/screenpipe-audio/src/core.rs#L150

by just

let config = cpal_audio_device.default_input_config()?

and build and test again with output device, i'm wondering if the behaviour is different on linux maybe

twilwa commented 1 month ago

My hunch might be that the 2024-10-09T02:25:39.770707Z INFO screenpipe_audio::stt: device: pipewire (input), resampling from 44100 Hz to 16000 Hz could be causing some issues -- pipewire (and presumably pipewire-pulse) come with a default setting of, i beleive, 44100.

possibly useful information, pw-metadata -n settings:

anon@pop-os:~/repos/screenpipe/target/release$ pw-metadata -n settings
Found "settings" metadata 31
update: id:0 key:'log.level' value:'2' type:''
update: id:0 key:'clock.rate' value:'48000' type:''
update: id:0 key:'clock.allowed-rates' value:'[ 44100, 48000, 88200, 96000, 176400, 192000, 352800, 384000 ]' type:''
update: id:0 key:'clock.quantum' value:'1024' type:''
update: id:0 key:'clock.min-quantum' value:'32' type:''
update: id:0 key:'clock.max-quantum' value:'2048' type:''
update: id:0 key:'clock.force-quantum' value:'0' type:''
update: id:0 key:'clock.force-rate' value:'0' type:''

pw-metadata settings:

anon@pop-os:~/repos/screenpipe/target/release$ pw-metadata settings
Found "default" metadata 36
update: id:0 key:'default.configured.audio.sink' value:'{"name":"alsa_output.usb-0c76_USB_PnP_Audio_Device-00.analog-stereo"}' type:'Spa:String:JSON'
update: id:0 key:'default.configured.audio.source' value:'{"name":"alsa_input.usb-0c76_USB_PnP_Audio_Device-00.mono-fallback"}' type:'Spa:String:JSON'
update: id:0 key:'default.audio.sink' value:'{"name":"alsa_output.usb-0c76_USB_PnP_Audio_Device-00.analog-stereo"}' type:'Spa:String:JSON'
update: id:0 key:'default.audio.source' value:'{"name":"alsa_input.usb-0c76_USB_PnP_Audio_Device-00.mono-fallback"}' type:'Spa:String:JSON'
louis030195 commented 1 month ago

@twilwa what is happening exactly? no transcriptions or? if you play something in audio output do you hear it when listening to the .mp4 saved to disk?

it's supposed to work with 96khz, 48 khz, 44,1khz etc.

twilwa commented 1 month ago

It records my microphone input but never the loopback audio output. Will try the code change after this restart.

My obs just started segfaulting after I changed a few settings, but it's been acting up today. Planning to install JACK so I have something a little more robust to work with -- might resolve the screenpipe issues as well. That said I think most distros don't come with it, so pipewire/pulse would probably be best in terms of making sure they generally work on common distros (Ubuntu, Debian, etc.)

louis030195 commented 1 month ago

can you do

cargo build --release # can add --features mkl 

./target/release/screenpipe --audio-device "pipewire (output)"
# or
./target/release/screenpipe --audio-device "jack (output)" 
# and play some voice audio
twilwa commented 1 month ago

Prefer that with or without the change described in https://github.com/mediar-ai/screenpipe/issues/450#issuecomment-2401147071 , or doesn't matter?

atm building without the code change, behavior is the same on pipewire(output) so far

twilwa commented 1 month ago

with jack (after installing actual jack rather than pipewire-pulse emulating jack, which i beleive is the default. cadence does pick up the audio levels, so i'm presuming the jack server is live.:

2024-10-09T03:29:45.705238Z  INFO screenpipe_audio::stt: device: jack (output), resampling from 48000 Hz to 16000 Hz
2024-10-09T03:29:45.707298Z  INFO screenpipe_audio::stt: device: jack (output), total audio frames processed: 0, frames that include speech: 0, speech duration: 0ms, speech ratio: NaN, min required ratio: 0.02
2024-10-09T03:29:45.720278Z  INFO screenpipe_audio::core: Recording jack (output) for 30 seconds
cannot connect system:capture_2 to alsa-jack.jackC.38170.11:in_001
2024-10-09T03:29:45.721176Z ERROR screenpipe_audio::core: Failed to build input stream: A backend-specific error has occurred: ALSA function 'snd_pcm_hw_params' failed with error 'I/O error (5)'

As a side note, when doing some obs troubleshooting, running it with 'sudo' prevents the segfault. Could try the same with screen pipe.

Tested a little, no change in behavior (for jack, no audio files are output at all) but sudo does change the error message:

2024-10-09T03:36:56.814768Z ERROR screenpipe_server::core: Error in record_and_transcribe for device jack (output) (iteration 1): Audio device not found, stopping thread

when running with sudo -- the server-based audio devices don't show up, so I tried sysdefault:CARD=Generic (output), which doesn't error, but doesn't pick anything up.

twilwa commented 1 month ago

After confirming with aplay -L that the device is the one associated with my current output audio:

2024-10-09T05:16:59.334042Z ERROR screenpipe_server::core: Error in record_and_transcribe for device iec958:CARD=Device,DEV=0 (output) (iteration 1): Audio device not found, stopping thread

the device does appear in the list-audio-devices list

twilwa commented 1 month ago
let config = cpal_audio_device.default_input_config()

tested rebuilding the new UI / rebuilding the binary with this modification as requested here: https://github.com/mediar-ai/screenpipe/issues/450#issuecomment-2401147071, and played around for a while with my system audio packages, making sure everything was installed and updated, poked around a few audio forums. i will mention that audio does work very differently on linux than it does on mac or windows -- my setup (and ubuntu default, now) runs pipewire-pulse, which is a pipewire server that emulates pulseaudio as far as my understanding goes. Then there's JACK, which isn't a default, but is used in a lot of audio routing programs, which needs to play nice with ALSA (same goes for pulse/pipewire, if I'm not mistaken.)

In looking into it a bit, it seems like the default audio configuration for Ubuntu is pipewire-pulse with wireplumber as a config manager. Supporting this by default is likely the best option, JACK and ALSA could remain lower priority.

In running main.rs through o1-mini, something interesting popped up:

let config = cpal_audio_device.default_input_config();

(requested change in the comment -- did you mean to say output? I reverted after testing the build, in any case.)

Issue: This line retrieves the default input configuration regardless of whether the device type is Input or Output. If DeviceType::Output is selected, it should retrieve the default output configuration.

b. Stream Building in record_and_transcribe The record_and_transcribe function builds an input stream regardless of the device type:


let stream = match config.sample_format() {
    cpal::SampleFormat::I8 => cpal_audio_device.build_input_stream(
        // ...
    ),
    // Other sample formats...
};

Issue: When DeviceType::Output is selected, the program should build an output stream instead of an input stream. Additionally, capturing output audio typically requires selecting a monitor device (e.g., "Monitor of ") on Linux.

If we're building the wrong stream and not checking for monitor devices, this could explain why we're only able to capture microphone input and manually selecting hardware devices doesn't find anything -- although it seems odd to me if it's working normally on Mac or Windows (especially windows).

Happy to continue tinkering, doubly so if you're up to put up a bounty if I can manage to get it up and running on my main PC + my Ubuntu alt? Never done rust before, but been meaning to learn for ages.

twilwa commented 10 hours ago

Was tinkering a bit with it again, today, curious if anyone else on Linux has found a workaround yet?