zyedidia / sregx

A tool and library for using structural regular expressions.
MIT License
60 stars 4 forks source link
golang regular-expressions structural-regex structural-regular-expressions

Structural Regular Expressions

Go Reference Go Report Card MIT License

sregx is a package and tool for using structural regular expressions as described by Rob Pike (link). sregx provides a very simple Go package for creating structural regular expression commands as well as a library for parsing and compiling sregx commands from the text format used in Pike's description. A CLI tool for using structural regular expressions is also provided in ./cmd/sregx, allowing you to perform advanced text manipulation from the command-line.

In a structural regular expression, regular expressions are composed using commands to perform tasks like advanced search and replace. A command has an input string and produces an output string. The following commands are supported:

The commands n[...], l[...], and u are additions to the original description of structural regular expressions.

The sregx tool also provides another augmentation to the original sregx description from Pike: command pipelines. A command may be given as <cmd> | <cmd> | ... where the input of each command is the output of the previous one.

Examples

Most of these examples are from Pike's description, so you can look there for more detailed explanation. Since p is the only command that prints, technically you must append | p to commands that search and replace, because otherwise nothing will be printed. However, since you will probably forget to do this, the sregx tool will print the result of the final command before terminating if there were no uses of p anywhere within the command. Thus when using the CLI tool you can omit the | p in the following commands and still see the result.

Print all lines that contain "string":

x/.*\n/ g/string/p

Delete all occurrences of "string" and print the result:

x/string/d | p

Replace all occurrences of "foo" with "bar" in the range of lines 5-10 (zero-indexed):

l[5:10]s/foo/bar/ | p

Print all lines containing "rob" but not "robot":

x/.*\n/ g/rob/ v/robot/p

Capitalize all occurrences of the word "i":

x/[A-Za-z]+/ g/i/ v/../ c/I/ | p

or (more simply)

x/[A-Za-z]+/ g/^i$/ c/I/ | p

Print the last line of every paragraph that begins with "foo", where a paragraph is defined as text with no empty lines:

x/(.+\n)+/ g/^foo/ l[-2:-1]p

Change all occurrences of the complete word "foo" to "bar" except those occurring in double or single quoted strings:

y/".*"/ y/'.*'/ x/[a-zA-Z]+/ g/^foo$/ c/bar/ | p

Replace the complete word "TODAY" with the current date:

x/[A-Z]+/ g/^TODAY$/ u/date/ | p

Capitalize all words:

x/[a-zA-Z]+/ x/^./ u/tr a-z A-Z/ | p

Note: it is highly recommended when using the CLI tool that you enclose expressions in single or double quotes to prevent your shell from interpreting special characters.

Installation

There are three ways to install sregx.

  1. Download the prebuilt binary from the releases page (comes with man file).

  2. Install from source:

git clone https://github.com/zyedidia/sregx
cd sregx
make build # or make install to install to $GOBIN
  1. Install with go get (version info will be missing):
go get github.com/zyedidia/sregx/cmd/sregx

Usage

To use the CLI tool, first pass the expression and then the input file. If no file is given, stdin will be used. Here is an example to capitalize all occurrences of the word 'i' in file.txt:

sregx 'x/[A-Za-z]+/ g/^i$/ c/I/' file.txt

The tool tries to provide high quality error messages when you make a mistake in the expression syntax.

Base library

The base library is very simple and small (roughly 100 lines of code). In fact, it is surprisingly simple and elegant for something that can provide such powerful text manipulation, and I recommend reading the code if you are interested. Each type of command may be manually created directly in tree form. See the Go documentation for details.

Syntax library

The syntax library supports parsing and compiling a string into a structural regular expression command. The syntax follows certain rules, such as using "/" as a delimiter. The backslash (\) may be used to escape / or \, or to create special characters such as \n, \r, or \t. The syntax also supports specifying arbitrary bytes using octal, for example \14. Regular expressions use the Go syntax described here.

Future Work

Here are some ideas for some features that could be implemented in the future.