serenity-rs / songbird

An async Rust library for the Discord voice API
ISC License
384 stars 110 forks source link

Add support for serverless infrastructure #162

Closed winstxnhdw closed 9 months ago

winstxnhdw commented 1 year ago

Not too long ago, Discord released their Interactions API, which led to the birth of serverless Discord bots for performing simple tasks. This allowed developers to offload the maintenance cost of running a home server or a VPS to AWS's generous free tier.

I would like to take a step further and integrate songbird into my AWS Lambda function, however, the library depends on opus, ffmpeg and youtube-dl. Is it possible for songbird to ship a binary with the above dependencies. If no, do you know of a direction I can look at to do it myself?

AWS Lambda functions have a maximum compute time of 15 minutes. That's enough to play an average of 5 songs in a single execution. Implementing this feature would drastically reduce the cost of owning a Discord bot, and for most private servers, the maintenance cost would be nothing.

GnomedDev commented 1 year ago

Discord's voice infra requires you to send an event over the gateway, then receive two back before you can actually connect to the voice server and start communicating. That would require a persistent gateway connection which doesn't sound serverless.

FelixMcFelix commented 1 year ago

Agreed, you need a persistent gateway connection to connect to a voice server in the first place. This appears to be incompatible with webhook-style Interactions, and I don't think you can access your voice session ID and token via REST API. That being said, you could probably shoehorn a time-limited voice connection into a Lambda.

The next branch makes the 'AWS Lambda' part a little more technically feasible by removing the use of the ffmpeg binary, and opus is never called as a binary so that's a non-issue. yt-dlp or youtube-dl are still essential in turning most user-given links into a usable audio URL. I've never run AWS Lambdas, but you might be able to move yt-dlp in via a container image. The main network limitation is that it disallows inbound network connections, which shouldn't break songbird voice connections. If you write a program which just creates a Driver (using all fields in ConnectionInfo) and takes a list of URLs, that should work. You will likely need a VPS-style bot to do that, though -- if you prod around the REST API and can access voice-state and voice-server somehow, then feel free to update us.

winstxnhdw commented 1 year ago

I am thinking of a VPS-style bot that wakes up through Discord interactions and lives for 15 minutes at most.

Discord's voice infra requires you to send an event over the gateway, then receive two back before you can actually connect to the voice server and start communicating. That would require a persistent gateway connection which doesn't sound serverless.

If this is over REST, Lambda can definitely block the main thread and wait for the response before connecting to the voice server.

I've never run AWS Lambdas, but you might be able to move yt-dlp in via a container image. The main network limitation is that it disallows inbound network connections, which shouldn't break songbird voice connections.

I will look into this but I suspect that this may be difficult because a Rust function has to reside in their special Amazon Linux 2 containers.

The main network limitation is that it disallows inbound network connections, which shouldn't break songbird voice connections.

This should not be a problem. There is an option to block all concurrent requests to the Lambda function during its execution.

FelixMcFelix commented 1 year ago

If this is over REST

The main problem is that, at least using docs, this is not over REST and requires a full websocket, which will stop webhook Interactions from being fired as far as I can tell.

winstxnhdw commented 1 year ago

The main problem is that, at least using docs, this is not over REST and requires a full websocket, which will stop webhook Interactions from being fired as far as I can tell.

I could use two Lambda functions. One to handle the webhook interactions and the other to maintain the websocket.

GnomedDev commented 1 year ago

If I recall correctly, webhook interactions aren't sent if there is an active websocket connection, but that might have changed

winstxnhdw commented 1 year ago

If I recall correctly, webhook interactions aren't sent if there is an active websocket connection, but that might have changed

If that's the case, maybe setting up a second 'helper' bot that runs on a separate Lambda instance might help.

FelixMcFelix commented 9 months ago

I think this is/was an interesting thought experiment, but is probably beyond what we want to work on.