An intelligent web browsing agent controlled by natural language.
Language is the most natural interface through which humans give and receive instructions. Instead of writing bespoke automation or scraping code which is brittle to changes, creating and adding agents should be as simple as writing plain English.
pip install browserpilot
The form factor is fairly simple (see below).
from browserpilot.agents.gpt_selenium_agent import GPTSeleniumAgent
instructions = """Go to Google.com
Find all textareas.
Find the first visible textarea.
Click on the first visible textarea.
Type in "buffalo buffalo buffalo buffalo buffalo" and press enter.
Wait 2 seconds.
Find all anchor elements that link to Wikipedia.
Click on the first one.
Wait for 10 seconds."""
agent = GPTSeleniumAgent(instructions, "/path/to/chromedriver")
agent.run()
The harder (but funner) part is writing the natural language prompts.
It helps if you are familiar with how Selenium works and programming in general. This is because this project uses GPT-3 to translate natural language into code, so you should be as precise as you can. In this way, it is more like writing code with Copilot than it is talking to a friend; for instance, it helps to refer to things as input
s or textareas
(vs. "text box" "search box") or "button which says 'Log in'" rather than "the login button". Sometimes, it will also not pick up on specific words that are important, so it helps to break them out into separate lines. Instead of "find all the visible textareas", you do "find all the textareas" and then "find the first visible textarea".
You can look at some examples in prompts/examples
to get started.
Create "functions" by enclosing instructions in BEGIN_FUNCTION func_name
and END_FUNCTION
, and then call them by starting a line with RUN_FUNCTION
or INJECT_FUNCTION
. Below is an example:
BEGIN_FUNCTION search_buffalo
Go to Google.com
Find all textareas.
Find the first visible textarea.
Click on the first visible textarea.
Type in "buffalo buffalo buffalo buffalo buffalo" and press enter.
Wait 2 seconds.
Get all anchors on the page that contain the word "buffalo".
Click on the first link.
END_FUNCTION
RUN_FUNCTION search_buffalo
Wait for 10 seconds.
You may also choose to create a yaml or json file with a list of instructions. In general, it needs to have an instructions
field, and optionally a compiled
field which has the processed code.
See buffalo wikipedia example.
You may pass a instruction_output_file
to the constructor of GPTSeleniumAgent which will output a yaml file with the compiled instructions from GPT-3, to avoid having to pay API costs.
There are two ways I envision folks contributing.
prompts/
! At some point, I will figure out a protocol for folder naming conventions and the evaluation of submitted code (for security, accuracy, etc). This would be a particularly attractive option for those who aren't as familiar with coding.InstructionCompiler
and (b) write the corresponding method in GPTSeleniumAgent
. This repo was inspired by the work of Yihui He, Adept.ai, and Nat Friedman. In particular, the basic abstractions and prompts used were built off of Yihui's hackathon code. The idea to preprocess HTML and use GPT-3 to intelligently pick elements out is from Nat.
GPTSeleniumAgent
. Those actions, to-date, include:
env.driver
, the Selenium webdriver.env.find_elements(by='id', value=None)
finds and returns list of elements.env.find_element(by='id', value=None)
is similar to env.find_elements()
except it only returns the first element.env.find_nearest(e, xpath)
can be used to locate an element near another one.env.send_keys(element, text)
sends text
to element.env.get(url)
goes to url.env.click(element)
clicks the element.env.wait(seconds)
waits for seconds
seconds.env.scroll(direction, iframe=None)
scrolls the page. Will switch to iframe
if given. direction
can be "up", "down", "left", or "right". env.get_llm_response(text)
asks AI about a string text
.env.retrieve_information(prompt)
returns a string, information from a page given a prompt. Use prompt="Summarize:" for summaries. Invoked with commands like "retrieve", "find in the page", or similar.env.ask_llm_to_find_element(description)
asks AI to find an element that matches the description.env.query_memory(prompt)
asks AI with a prompt to query its memory (an embeddings index) of the web pages it has browsed. Invoked with "Query memory".env.save(text, filename)
saves the string text
to a file filename
.env.get_text_from_page()
returns the free text from the page.InstructionCompiler
is used to parse user input into semantically cogent blocks of actions.Memory
which enables it to synthesize what it sees.0.2.51
-0.2.42 - 0.2.44
examples.py
and dependencies.0.2.38 - 0.2.41
enable_memory
to memory_file
to enable more control over what the memory is called. Allow users to load memory as well.get_text_from_page
simpler.0.2.26 - 0.2.37
0.2.14 - 0.2.25
from browserpilot.agents import <agent>
possible.find_element
and find_elements
search only for displayed elements.0.2.10 - 0.2.13
0.2.7 - 0.2.9
0.2.4 - 0.2.6
0.2.3
chrome_options
to somewhere more sensible. Just keep the yaml clean, you know?0.2.2
0.2.1
0.2.0
Studio
CLI which helps iteratively test prompts!<0.2.0
This package runs code output from the OpenAI API in Python using exec
. π¨ This is not considered a safe convention π¨. Accordingly, you should be extra careful when using this package. The standard disclaimer follows.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.