saulpw / aipl

Array-Inspired Pipeline Language
MIT License
119 stars 7 forks source link

AIPL (Array-Inspired Pipeline Language)

Tests discord

A tiny DSL to make it easier to explore and experiment with AI pipelines.

Features

summarize.aipl

Here's a prime example, a multi-level summarizer in the "map-reduce" style of langchain:

#!/usr/bin/env bin/aipl

# fetch url, split webpage into chunks, summarize each chunk, then summarize the summaries.

# the inputs are urls
!read

# extract text from html
!extract-text

# split into chunks of lines that can fit in the context window
!split maxsize=8000 sep=\n

# have GPT summary each chunk
!format

Please read the following section of a webpage (500-1000 words) and provide a
concise and precise summary in a few sentences, optimized for keywords and main
content topics. Write only the summary, and do not include phrases like "the
article" or "this webpage" or "this section" or "the author". Ensure the tone
is precise and concise, and provide an overview of the entire section:

"""
{_}
"""

!llm model=gpt-3.5-turbo

# join the section summaries together
!join sep=\n-

# have GPT summarize the combined summaries

!format

Based on the summaries of each section provided, create a one-paragraph summary
of approximately 100 words. Begin with a topic sentence that introduces the
overall content topic, followed by several sentences describing the most
relevant subsections. Provide an overview of all section summaries and include
a conclusion or recommendations only if they are present in the original
webpage. Maintain a precise and concise tone, and make the overview coherent
and readable, while preserving important keywords and main content topics.
Remove all unnecessary text like "The document" and "the author".

"""
{_}
"""

!llm model=gpt-3.5-turbo

!print

Usage

usage: aipl [-h] [--debug] [--test] [--interactive] [--step STEP] [--step-breakpoint] [--step-rich] [--step-vd] [--dry-run] [--cache-db CACHEDBFN] [--no-cache]
            [--output-db OUTDBFN] [--split SEPARATOR]
            [script_or_global ...]

AIPL interpreter

positional arguments:
  script_or_global      scripts to run, or k=v global parameters

options:
  -h, --help            show this help message and exit
  --debug, -d           abort on exception
  --test, -t            enable test mode
  --interactive, -i     interactive REPL
  --step STEP           call aipl.step_<func>(cmd, input) before each step
  --step-breakpoint, -x
                        breakpoint() before each step
  --step-rich, -v       output rich table before each step
  --step-vd, --vd       open VisiData with input before each step
  --dry-run, -n         do not execute @expensive operations
  --cache-db CACHEDBFN, -c CACHEDBFN
                        sqlite database for caching operators
  --no-cache            sqlite database for caching operators
  --output-db OUTDBFN, -o OUTDBFN
                        sqlite database accessible to !db operators
  --split SEPARATOR, --separator SEPARATOR, -s SEPARATOR
                        separator to split input on

Command Syntax

This is the basic syntax:

Commands can take positional and/or keyword arguments, separated by whitespace.

Keyword arguments have an = between the key and the value, and non-keyword arguments are those without a = in them.

The AIPL syntax will continue to evolve and be clarified over time as it's used and developed.

Notes:

List of operators

Defining a new operator

It's pretty easy to define a new operator that can be used right away. For instance, here's how the !join operator might be defined:

@defop('join', rankin=1, rankout=0)
def op_join(aipl:AIPL, v:List[str], sep=' ') -> str:
    'Concatenate text values with *sep* into a single string.'
    return sep.join(v)

The join operator is rankin=1 rankout=0 which means that it takes a list of strings and outputs a single string.

Architecture

The fundamental data structure is a Table: an array of hashmaps ("rows"), with named Columns that key into each Row to get its value.

A value can be a string or a number or another Table.

The value of a row is the value in the rightmost column of its table. The rightmost column of a table is a vector of values representing the whole table.

A simple vector has only strings or numbers. A simple table has a simple rightmost value vector and is Rank 0. Each nesting of tables in the rightmost value vector increases its Rank by 1.

operators

Each operator consumes 0 or 1 or 2 operands (its arity), and produces one result, which becomes the operand for the next operator.

Each operator has an "in rank" and an "out rank", which is the rank of the operands they input and output.

By default, each operator is applied across the deepest nested table. The result of each operator is then placed in the deepest nested table (or its parent).

rankin=0: one scalar at a time

With rankin=0 and rankout of:

rankin=0.5: consume whole row

With rankin=0.5, and rankout of:

rankin=1: consume the rightmost column

With rankin=1, and rankout of:

rankin=1.5: consume whole table

With rankin=2, and rankout of:

arguments and formatting

In addition to operands, operators also take parameters, both positional and named (args and kwargs in Python). These cannot have spaces, but they can have Python format strings like {input}.

The identifiers available to Python format strings come from a chain of contexts:

More information

Come chat with us on Discord bluebird.sh/chat or Mastodon @saulpw@fosstodon.org.

If you want to get updates about I'm playing with, you can sign up for my AI mailing list.

License

Licensed under MIT.