Reliably serve unique and plausible text

erikedlund commented 3 years ago

A number of solutions have been proposed to generate good text to fool the forms that won't be filtered out by site admins easily.

Roughly they fall into three categories:

1) Generate them locally, either with a lightweight but unconvincing random text generator like Faker or a convincing but heavyweight tool like GPT-2.

2) Serve them from a machine generating them remotely, which can be set up to run GPT-2 without bogging down a user's machine or making installation difficult. Downsides: DDoS, hosting fees, setting up a server is a hassle

3) Serve pre-generated text from a remote source like an S3 bucket or collection of S3 buckets. Downsides: susceptible to scraping by site admins (can be mitigated by updating data frequently), small hosting fee

A pre-existing implementation of 2) is running, but it requires a user to create a free account at DeepAI and set an environment variable, which is not suitable for most users. However it does provide an example of how to set up prompts for GPT-2 and turn them into unique responses.

andria-dev commented 3 years ago

We should be able to have users create a .env file or something similar so they don't have to create an environment variable every time they make a new terminal. And we could then, in the Python script, load the .env file.

For users who don't want to get the DeepAI free account, we could then set up Faker in the Python script.

erikedlund commented 3 years ago

Commit above sidesteps the issue for now by using a default API key for a dummy account. Should be usable for anyone for the near term (the key may be disabled in the future if it gets a ton of use).

The --generate CLI option turns this feature on. Users will see the text generated in the log.

dcbark01 commented 3 years ago

For visibility, here is a link to the repo that implements an API for the option you outlined in (1).

andria-dev commented 3 years ago

I like the PR that was just pulled that added the -g flag. I think we need to add better error handling to the actual requests made with the API key so that, in case the request gets rate limited or the JSON is invalid, we can log it appropriately so that the person running the script isn't confused

LakesideMiners commented 3 years ago

This should be simple enough to implement with a

Try:
Except:

Just would need to check what status code was returned.

ramblingjordan / AbBOT-python

Reliably serve unique and plausible text #3