This is a template for making reddit bots in python, and deploying them to AWS Lambda.
This bot template has some nice features:
r/bottest
and responds to any self post that mentions the word potato. Now you can just go look at that specific potato logic, see how it works, and change it into what you want your bot to do. That's easier than writing everything yourself.This bot also checks up on comments after it makes them. If your bot's comment gets downvoted to negative infinity, you'd want to know. This system sends you an email when your comment gets downvoted to negative.
This bot also checks on the post you commented on, after you comment. Perhaps the submitter read your comment and changed their post. You may want to update your comment to avoid looking silly.
It checks up on comments very frequently for new comments, and less frequently for old comments. This is to provide the perfect compromise between low latency updates, and not using the reddit API too much. (You'll get throttled if you try to check up on all your bot's posts, every minute)
Running your own server can be a lot of hassle, and results in downtime, and hardware costs. I initially ran my bot on a beaglebone server, but a lightening strike in my street broke the server. For most bots, you can run using serverless Lambda functions in Amazon's cloud, and you won't have to pay. (They have a free usage tier)
If you don't know what Amazon's "Lambda functions" are: You give them code, and they run it. You don't need to worry about an operating system (sudo apt-get install blah
). You don't need to worry about scaling. If your reddit bot wants to comment on a thousand posts per second today, and nothing tomorrow, Amazon will happily handle that. (Reddit will definitely throttle that, but my point is that you don't need to worry about scaling, and you only pay for the seconds of computation that you use.)
You can try to run a simple script as a cron job on a simple server. As mentioned earlier, there are downsides to a physical server, and a virtual machine is more expensive than lambda functions.
This tooling is very customisable, and can be extended to use any AWS resource. You want to send an SMS whenever your bot spots a certain type of post? That's easy, just modify the cloudformation template. You want to use an image recognition API? Same again, it's far easier than with other tools.
This tooling parallelises all aspects of lambda deployment:
Let's start by just installing the template potatobot.
It responds to self posts in r/bottest which contain the word potato
.
cookiecutter
library.
(Typically with pip install cookiecutter
or pip install cookiecutter --user
)cookiecutter https://github.com/mlda065/paragraphiser_bot_aws.git
. (I haven't tested this on a windows computer. It might work, I dunno. Windows is pretty useless) You'll be asked to fill out some information:
bot_name
: The name of your bot. It must contain only letters, numbers, dashes and start with a letteraws_region
: Amazon's cloud has many regions. The cheapest is generally us-east-1
. Here is the full list of possibilitiesdirectory_name
: The name of the folder you want to download the template into. The folder should not yet existemail_for_alerts
: When there's an error with your bot, or a comment by your bot is downvoted into negative territory, an email will be sent here.reddit_client_id
: Get this from the previous stepreddit_client_secret
: Get this from the previous stepreddit_password
: the password for your bot's reddit account (which you just created). This will be saved in plain text, so make sure you don't the same password for anything elsereddit_username
: the username of the reddit account you just createduser_agent
: An arbitrary string used to identify your bot, and which version. e.g. "My Potatobot v0.1"The template bot scans self posts in r/bottest, and comments on that post if it contains the word 'potato'. If that post is later edited, it updates the comment.
aws configure
Let's install the python packages we need for the tooling.
./makeForDeploy.sh
in a terminal. (Alternatively just install the boto3
library into your OS)./env
. ./env/bin/activate
This template comes with a fully fledged AWS deployment tooling. It's more general than just reddit bots or just lambda functions. Read 'How it works' to understand the detail'.
You can deploy to different stages (e.g. dev
vs prod
). If you don't know what that means, just use dev
.
To deploy the bot:
python deploy.py -s dev
-s dev
tells the system to deploy to the dev stage. You can replace dev
with prod
or any other string.You should see all green when the command finishes.
Submit a post in r/bottest and wait for the bot to respond. It's currently polling for new posts every 10 minutes, so you may have to wait 10 minutes.
If you don't want to wait for 10 minutes:
checkForNew
To change what the criteria is, go into data/util/common.py
.
generate_reply
takes in a praw submission object. (praw is the library used to talk to the reddit API).
submission.id
is the unique identifier of each post. This is a short string which appears in the url in your browser when you look at the post.submission.is_self
is True
or False
depending on whether the post is a self post (text) or non-self post (link)submission.selftext
only exists for self posts. This is the body of the post.{'original_reply':msg}
. msg
is the markdown formatted comment your bot will make. (You don't need to make it reply. The function calling generate_reply
will do that). You may include other key/value pairs within this dict. That exact dict will be passed later to update_reply
as the data
argumentschedule_checks
in data/lambda/checkForNew/main.py
{'updated_reply':msg}
(among other keys). Your comment will be updated to say the contents of msg
. This dict will be returned the next time update_reply
is called.unit_tests
function in data/util/common.py
update_reply
to see how this works. The files of the templates are data/util/replyTemplateNew.mako
and replyTemplateUpdate.mako
The full deployment can take about 5 minutes, which is a frustratingly long deployment cycle. You can skip steps. There are 4 steps:
makescript.sh
file within each lambda function's folder in data/lambda/
. You only need to run this if you have modified the makescript.sh
file (to add new python libraries), or modified anything in data/util
or data/credentials
(which gets copied in to the data/lambda/x/include
folders). If you have not done either of those things since your last deployment, use the -b
flag. e.g. python deploy.sh -s prod -b
.zip
file. If you have modified the python code in any data/lambda/x/main.py
file, then you need to do this step. If you needed to do the previous step, then you need to do this step. Otherwise you can skip this step with the -z
flag. e.g. python deploy.py -s prod -bz
.data/cloudformation/stack.yaml
), then you can skip this step with the -u
flag. e.g. python deploy.py -s prod -bzu
If you're finding that the upload step is taking too long, consider doing your development on an EC2 virtual machine. Uploads to S3 are faster that way.
After thorough testing, the subreddit the bot browses in can be changed by going into data/cloudformation/stack.yaml
. Search for subreddits
.
Replace "bottest"
with your new subreddit name(s).
If you want the bot to be active in multiple subreddits, seperate them with a comma. Leave off the "r/"
There are unit tests which are conducted whenever a lambda is deployed. If the tests fail, you will be told, and will see the output of one of the tests.
In the main.py
file for each lambda (data/lambda/x/main.py
) there is a function called lambda_handler()
. The general format is:
def lambda_handler(event,contex):
if ('unitTest' in event) and event['unitTest']:
print('Running unit tests')
# add your unit tests here
else:
print('Running main (non-test) handler')
main() # this handles the actual workload
The deployment tooling invokes the lambda function in the lambda environment, with all the same permissions as normal.
When the tests fail, you will see the output of one of the failed tests printed by the deployment tooling.
If you want to see more detailed logs of the tests, or of the production invocations,
open up AWS in your browser, and go to Cloudwatch. Select Logs, then filter by
/aws/lambda/<botname>-stack-
.
There are 5 lambda functions.
data/cloudformation/stack.yaml
util/common.py
generate_reply()
to determine whether the bot should reply to it or not. If so:
botname-stack-postHistory
dynamodb
tablebotname-stack-schedule
table. One for each time that the bot should come back and check the comment and the post. One feature of this system is that these checks are very frequent for new comments, and less frequent for old comments. The large integer in the index of this table is a unix timestamp.botname-stack-schedule
. If there are any timestamps in the past or within the next 60 seconds, this bot checks the posts corresponding to those timestamps. (Duplicates merged) It doesn't check it directly. For each post, poll
invokes checkOldOne
with a payload that contains one post id.poll
. Or can be manually invoked with a payload of {"post_id":"xxxxxxx"}
if you want to check on a particular post. (Where xxxxxxx
is the post id in the url of that submission)poll
keeps encountering the same error. Once you deploy, the emails will start again.As mentioned earlier, the deployment tooling is a fully customisable, generalisable AWS deployment kit. I've looked into alternatives (e.g. Zappa, serverless.com) and I couldn't find anything that did quite what I want.
Here's how it works:
data/lambda
data/lambda/$LAMBDA/makescript.sh
is a bash script to create a virtual environment. This is useful if you want to create a different virtual environment for each lambda in your project. (e.g. if Lambda A requires large library X, and Lambda B requires large library Y, and X+Y is too large for lambda). This is invoked by the deployment tooling if you don't use the -b
flag. The makescripts are also how files from data/util
or data/credentials
get copied into each lambda. Those folders are for any code which is used by multiple lambdas. The deployment tooling executes the makescripts for all lambdas in parallel.data/lambda/$LAMBDA/main.py
has the main logic, and any unit tests.lambda.zip
file. This is the virtual environment (created by the makescript.sh
), main.py
and anything in data/lambda/$LAMBDA/include
. The deployment tooling zips all lambdas in parallel.data/cloudformation.stack.yaml
) is applied. (Either stack creation or change set). Normally if you upload a new version of the lambda zip to S3 and apply a cloudformation template, it will not update the lambda to use what's in S3. This tooling keeps track of the version of lambda you just uploaded, and passes that as a parameter to the cloudformation yaml template. Everything for each project is kept contained within one cloudformation template. (except the S3 bucket)I tried to handle the secrets for reddit properly. I really did. But it's bloody hard to pass secrets into Amazon's lambda function with enterprise-grade security. Amazon's key handler is really confusing to use.
So I gave up and just saved the credentials for reddit in credentials/praw.ini
, which is copied into data/lambda/checkForOne/include/praw.ini
and data/lambda/checkOldOne/include/praw.ini
by the deployment tooling. Then it's added to the zip file for the lambda function.
This is kind of bad practice, but hey, this is just a reddit bot. That's why you should make a new reddit user just for the bot, and not use the same password anywhere else.
The AWS IAM permissions of the resources are very loose. I was too lazy to tighten them. (If you feel up for it, go change ExecutionRole
in data/cloudformation/stack.yaml
).
But why bother? If you have nothing in this AWS account except for your reddit bot, what's the worst case scenario? Someone takes control of your reddit bot.
If that happens, log in to reddit and change your bot's password. Tada!
If you do have other stuff in the same AWS account, well ... don't! It's good practice to have a seperate AWS account for each project. It provides excellent security through iron-clad partitioning. It also means that if you hit a limit in one project (e.g. max writes to dynamodb), your other projects won't come grinding to a halt.
That's it! My tooling keeps everything together.
If my documentation isn't clear enough, or you have a particular request, just create an 'issue' in this repository.