livekit / livekit-cli

Command line interface to LiveKit
https://docs.livekit.io
Other
210 stars 64 forks source link

Feature Request: Publishing Audio from Multicast #378

Open s-hamdananwar opened 1 month ago

s-hamdananwar commented 1 month ago

Hi LiveKit CLI dev team!

What do you guys think about adding a new source of audio publishing option by listening to incoming multicast audio packets and then publishing it to a room? I think we could support G711, PCM and Opus codec packets and let the CLI transcode them to Opus if needed. Since listening to multicast packets require a socket connection running asynchronously, we might also need to give the options for users create, delete and list multicast listeners that is being operated by the CLI.

Use Case Multicast is actively used in the radio and VoIP sectors, and having this feature would allow developers to send audio to a LiveKit room by running a CLI command from any one of their already existing servers instead of creating a new bridge server/incorporating LiveKit server SDK into their existing servers.

I would love to hear your thoughts and suggestions. According to the discussion, I would be happy to submit a PR for this feature!

rektdeckard commented 1 month ago

Hi @s-hamdananwar! We're absolutely in support of adding more ways to publish media, and we'd be glad to have your help in implementing this. It looks like a reasonable set of codecs for this use-case, but I'd defer to @dennwc

You can find some prior art related to the lk room join command, which allows the CLI publication of media sources from Unix and TCP sockets, and this is probably the appropriate place to add more publication features.

s-hamdananwar commented 1 month ago

Hi @rektdeckard, thanks for your response! I am super excited for this contribution.

I was thinking of adding it part of the lk room join command as well, but I have some questions that I could use your insights and ideas before starting to work on this:

  1. Since the use of multicast requires opening UDP sockets, which unlike TCP doesn't require a "connection" (I misworded it in my original message, I have now edited it), what action or event should close the socket (and consequently remove the bot participant) that is publishing the incoming packets? Since there is no end of stream/end of connection events for the socket, nor setting a timeout is the right approach, I think the socket should be listening for incoming packets asynchronously (as a goroutine I believe) until the user manually remove the bot participant or the room is deleted. Please let me know if you have any other better ways around this.
  2. If removing the participant is the only way to stop the socket connection, shouldn't there be an option for the user to list all the existing bot participants in a project that is linked to a socket listener for publishing audio, along with the room and socket endpoint details? In that case, should that be stored in the config and be accessible to the user through a command?
rektdeckard commented 1 month ago

To question 1: in my mind we wouldn't be spawning a daemon or background process to handle the restreaming. The CLI process would simply block as it already does with lk room join --publish <files...> (I think that is expected in this case). And then to clean up, we would we would simply intercept SIGINT, SIGTERM, SIGQUIT to handle closing the socket(s) and removing the bot publishing the stream(s). We could also include and optional timeout flags as a convenience.

To question 2: if you're streaming audio from multiple sources/hosts using this method and want to be able to clean them all up from a single point, it should be as simple as calling lk room participants remove --room <roomname> <participant> on each to kick them. We don't really differentiate between "bots" and other kinds of participants, and I don't think we want to start -- but you could also easily stash some information specific to your use-case in the participant metadata to signify "I am a bot". You could also just identify those participants by what they are publishing, or by their name (in that case we may want to consider adding more detail to the output of lk room participants list <roomname>, to include participants' published tracks and their media types).

s-hamdananwar commented 1 month ago

@rektdeckard Thanks so much for your detailed response! Considering your recommendation, SIGINT, SIGTERM, SIGQUIT will be handling terminating the blocked publish process.

Before I do a PR, I have one final question. I have implemented and tested multicast audio publishing using G711, PCM and Opus codecs and they work as expected when running locally (I am testing the CLI locally in a non-conventional way by creating go.mod inside the cmd/lk directory). Transcoding G711 or PCM format into Opus requires the use of the CGo Opus package, which requires libopus and libopusfile as dependencies. I know LiveKit's SIP Service uses the same package hence it requires libopus and libopusfile as prerequisites for running it locally but I believe requiring them as prerequisites for CLI is an overkill, especially considering it would only be used for one of the many options in the lk room join --publish-.. commands. I also see in the Dockerfile that CGo is disabled for the build command, which makes me believe the Opus package would not work in the first place. Considering these, would you recommend just supporting Opus codec and disregard G711 and PCM codecs? Or is there a way we could accommodate the CGo Opus package into the project?

dennwc commented 1 month ago

Considering the need for CGo, maybe this project could live in livekit-examples instead of a CLI?

@rektdeckard What do you think?

rektdeckard commented 4 weeks ago

Agree I don't love the idea of adding features with system dependencies here, and that maybe a separate project is a good place to trial this. But if pure software support is doable for just Opus, we can definitely add it.

s-hamdananwar commented 3 weeks ago

Thanks @rektdeckard and @dennwc for your suggestions! I have submitted a PR that would let user publish opus audio without the need for any dependencies.

If you would like to test the feature, you can run a separate simple program that acts as a multicast sender. It should subscribe to the audio of a different participant other than the bot created by the CLI and send their audio as multicast packets to the same endpoint. Here is a simple example:

func onTrackSubscribed(track *webrtc.TrackRemote, publication *lksdk.RemoteTrackPublication, rp *lksdk.RemoteParticipant) {

    multicastAddr := "225.8.11.101:9001" // Change accordingly

    udpAddr, err := net.ResolveUDPAddr("udp4", multicastAddr)
    if err != nil {
        fmt.Println("Error resolving UDP address:", err)
        return
    }

    conn, err := net.ListenPacket("udp4", ":0")
    if err != nil {
        fmt.Println("Error creating UDP connection:", err)
        return
    }
    defer conn.Close()

    p := ipv4.NewPacketConn(conn)

    if err := p.SetTTL(2); err != nil {
        fmt.Println("Error setting TTL:", err)
        return
    }

    if err := p.SetMulticastLoopback(false); err != nil {
        fmt.Println("Error setting TTL:", err)
        return
    }

    for {
        pkt, _, err := track.ReadRTP()
        if err != nil {
            break
        }
        data := pkt.Payload
        if !publication.IsMuted() {
            _, err = p.WriteTo(data, nil, udpAddr)
            if err != nil {
                fmt.Println("Error sending message:", err)
                return
            }
            fmt.Println("Multicast packet sent successfully.")
        }
    }
}

Please let me know what you guys think! I would be happy to clear out any questions or make any fixes based on your concerns and/or suggestions.