mdavis-xyz / paragraphiser_bot_aws

a reimplementation of my reddit bot with AWS Lambda functions
18 stars 4 forks source link

Matt's Reddit Bot Template

This is a template for making reddit bots in python, and deploying them to AWS Lambda.

Why Use This Template

Bot Logic

This bot template has some nice features:

This bot also checks up on comments after it makes them. If your bot's comment gets downvoted to negative infinity, you'd want to know. This system sends you an email when your comment gets downvoted to negative.

This bot also checks on the post you commented on, after you comment. Perhaps the submitter read your comment and changed their post. You may want to update your comment to avoid looking silly.

It checks up on comments very frequently for new comments, and less frequently for old comments. This is to provide the perfect compromise between low latency updates, and not using the reddit API too much. (You'll get throttled if you try to check up on all your bot's posts, every minute)

Tooling and Infrastructure

Running your own server can be a lot of hassle, and results in downtime, and hardware costs. I initially ran my bot on a beaglebone server, but a lightening strike in my street broke the server. For most bots, you can run using serverless Lambda functions in Amazon's cloud, and you won't have to pay. (They have a free usage tier)

If you don't know what Amazon's "Lambda functions" are: You give them code, and they run it. You don't need to worry about an operating system (sudo apt-get install blah). You don't need to worry about scaling. If your reddit bot wants to comment on a thousand posts per second today, and nothing tomorrow, Amazon will happily handle that. (Reddit will definitely throttle that, but my point is that you don't need to worry about scaling, and you only pay for the seconds of computation that you use.)

You can try to run a simple script as a cron job on a simple server. As mentioned earlier, there are downsides to a physical server, and a virtual machine is more expensive than lambda functions.

This tooling is very customisable, and can be extended to use any AWS resource. You want to send an SMS whenever your bot spots a certain type of post? That's easy, just modify the cloudformation template. You want to use an image recognition API? Same again, it's far easier than with other tools.

This tooling parallelises all aspects of lambda deployment:

How To Use It

Let's start by just installing the template potatobot. It responds to self posts in r/bottest which contain the word potato.

  1. This repo is a cookiecutter template. Install the cookiecutter library. (Typically with pip install cookiecutter or pip install cookiecutter --user)
  2. Create a new reddit account to run your bot as. (It's neater than using your own)
  3. Log in to reddit in your browser with this account. Click 'preferences', then select 'apps'
  4. Fill out the information. Once you have 'created' the app, take note of the information you see. You'll need it for the next step.
  5. In a terminal on a Linux computer, run: cookiecutter https://github.com/mlda065/paragraphiser_bot_aws.git. (I haven't tested this on a windows computer. It might work, I dunno. Windows is pretty useless) You'll be asked to fill out some information:
    • bot_name: The name of your bot. It must contain only letters, numbers, dashes and start with a letter
    • aws_region: Amazon's cloud has many regions. The cheapest is generally us-east-1. Here is the full list of possibilities
    • directory_name: The name of the folder you want to download the template into. The folder should not yet exist
    • email_for_alerts: When there's an error with your bot, or a comment by your bot is downvoted into negative territory, an email will be sent here.
    • reddit_client_id: Get this from the previous step
    • reddit_client_secret: Get this from the previous step
    • reddit_password: the password for your bot's reddit account (which you just created). This will be saved in plain text, so make sure you don't the same password for anything else
    • reddit_username: the username of the reddit account you just created
    • user_agent: An arbitrary string used to identify your bot, and which version. e.g. "My Potatobot v0.1"

The template bot scans self posts in r/bottest, and comments on that post if it contains the word 'potato'. If that post is later edited, it updates the comment.

Let's install the python packages we need for the tooling.

This template comes with a fully fledged AWS deployment tooling. It's more general than just reddit bots or just lambda functions. Read 'How it works' to understand the detail'.

You can deploy to different stages (e.g. dev vs prod). If you don't know what that means, just use dev.

To deploy the bot:

You should see all green when the command finishes.

Submit a post in r/bottest and wait for the bot to respond. It's currently polling for new posts every 10 minutes, so you may have to wait 10 minutes.

If you don't want to wait for 10 minutes:

Customise the logic

To change what the criteria is, go into data/util/common.py.

Development

The full deployment can take about 5 minutes, which is a frustratingly long deployment cycle. You can skip steps. There are 4 steps:

  1. The virtual environment of each lambda function is built by executing the makescript.sh file within each lambda function's folder in data/lambda/. You only need to run this if you have modified the makescript.sh file (to add new python libraries), or modified anything in data/util or data/credentials (which gets copied in to the data/lambda/x/include folders). If you have not done either of those things since your last deployment, use the -b flag. e.g. python deploy.sh -s prod -b
  2. Each lambda gets zipped up into a .zip file. If you have modified the python code in any data/lambda/x/main.py file, then you need to do this step. If you needed to do the previous step, then you need to do this step. Otherwise you can skip this step with the -z flag. e.g. python deploy.py -s prod -bz.
  3. The zip of each lambda gets uploaded to S3. If you needed to do either of the previous steps, you need to do this step. If you only changed the cloudformation template (data/cloudformation/stack.yaml), then you can skip this step with the -u flag. e.g. python deploy.py -s prod -bzu
  4. The cloudformation script is applied to the stack. The deployment script pushes whatever is the latest version of each zip from S3. (This is something normal cloudformation does not do. A benefit of using this tooling.) This step can't be skipped, it is compulsory. Unfortunately this step is very super quick.

If you're finding that the upload step is taking too long, consider doing your development on an EC2 virtual machine. Uploads to S3 are faster that way.

After thorough testing, the subreddit the bot browses in can be changed by going into data/cloudformation/stack.yaml. Search for subreddits. Replace "bottest" with your new subreddit name(s). If you want the bot to be active in multiple subreddits, seperate them with a comma. Leave off the "r/"

Debugging and Logs

There are unit tests which are conducted whenever a lambda is deployed. If the tests fail, you will be told, and will see the output of one of the tests. In the main.py file for each lambda (data/lambda/x/main.py) there is a function called lambda_handler(). The general format is:

def lambda_handler(event,contex):
    if ('unitTest' in event) and event['unitTest']:
        print('Running unit tests')
        # add your unit tests here
    else:
        print('Running main (non-test) handler')
        main() # this handles the actual workload

The deployment tooling invokes the lambda function in the lambda environment, with all the same permissions as normal.

When the tests fail, you will see the output of one of the failed tests printed by the deployment tooling. If you want to see more detailed logs of the tests, or of the production invocations, open up AWS in your browser, and go to Cloudwatch. Select Logs, then filter by /aws/lambda/<botname>-stack-.

How it works

Bot

There are 5 lambda functions.

Tooling

As mentioned earlier, the deployment tooling is a fully customisable, generalisable AWS deployment kit. I've looked into alternatives (e.g. Zappa, serverless.com) and I couldn't find anything that did quite what I want.

Here's how it works:

Security

I tried to handle the secrets for reddit properly. I really did. But it's bloody hard to pass secrets into Amazon's lambda function with enterprise-grade security. Amazon's key handler is really confusing to use. So I gave up and just saved the credentials for reddit in credentials/praw.ini, which is copied into data/lambda/checkForOne/include/praw.ini and data/lambda/checkOldOne/include/praw.ini by the deployment tooling. Then it's added to the zip file for the lambda function. This is kind of bad practice, but hey, this is just a reddit bot. That's why you should make a new reddit user just for the bot, and not use the same password anywhere else.

The AWS IAM permissions of the resources are very loose. I was too lazy to tighten them. (If you feel up for it, go change ExecutionRole in data/cloudformation/stack.yaml). But why bother? If you have nothing in this AWS account except for your reddit bot, what's the worst case scenario? Someone takes control of your reddit bot. If that happens, log in to reddit and change your bot's password. Tada!

If you do have other stuff in the same AWS account, well ... don't! It's good practice to have a seperate AWS account for each project. It provides excellent security through iron-clad partitioning. It also means that if you hit a limit in one project (e.g. max writes to dynamodb), your other projects won't come grinding to a halt.

Deleting the bot

  1. Log into AWS in a browser.
  2. Go to "Cloudformation"
  3. Delete the stack with the relevant name
  4. Go delete the S3 bucket with the code

That's it! My tooling keeps everything together.

Help

If my documentation isn't clear enough, or you have a particular request, just create an 'issue' in this repository.

TODO