It is necessary to reference LINQ to Regex assembly using #r
directive.
#r "path\Pihrtsoft.Text.RegularExpressions.Linq.dll"
using Pihrtsoft.Text.RegularExpressions.Linq;
using static Pihrtsoft.Text.RegularExpressions.Linq.Patterns;
The library contains two namespaces:
Pihrtsoft.Text.RegularExpressions.Linq;
Pihrtsoft.Text.RegularExpressions.Linq.Extensions;
First namespace is a library root namespace. Second namespace contains static classes with extensions methods that extends existing .NET types.
Pattern
is the very base library type. It represent an immutable regular expression pattern. Instances of the Pattern
class and its descendants can be obtained through Patterns
static class. Pattern
class also offers instance methods that enables to combine patterns.
Following pattern will match a digit character.
Pattern pattern = Patterns.Digit();
Regex syntax: \d
It is recommended to reference Patterns
class with using static
(C# 6.0 or higher)
using static Pihrtsoft.Text.RegularExpressions.Linq.Patterns;
or Imports
(Visual Basic)
Imports Pihrtsoft.Text.RegularExpressions.Linq.Patterns
This will allow you to create patterns without repeatedly referencing Patterns
class.
Pattern pattern = Digit();
CharGrouping
type represents a content of a character group. Instances of the CharGrouping
class and its descendants can be obtained through Chars
static class. CharGrouping
class also offers instance methods that enables to combine elements.
Pattern
object can be converted to a regular expression text using a ToString
method.
public override string ToString()
public string ToString(PatternOptions options)
public string ToString(PatternSettings settings)
LINQ to Regex always interprets each character in a string
parameter as a literal, never as a metacharacter. Following pattern will match a combination of a backslash and a dot.
Pattern pattern = @"\.";
Regex syntax: \\\.
A parameter that implements at least non-generic IEnumerable
interface is interpreted in a way that any one element of the collection has to be matched.
There is one exception from this rule and that is Patterns.Concat
static method.
A lot of methods that returns instance of the Pattern
class accepts parameter of type object
which is usually named content
. This methods can handle following types (typed as object
):
Pattern
CharGrouping
string
char
object[]
IEnumerable
Last two items in the list, Object[]
and IEnumerable
can contains zero or more elements (patterns) any one of which has to be matched.
Methods that allows to pass a content typed as object
usually allows to pass an array of object with params
(ParamArray
in Visual Basic) keyword. This overload simply convert the array of objects to the object
and calls overload that accept object
as an argument.
Maybe
method returns a pattern that matches previous element zero or one time.
var pattern = Digit().Maybe();
or
var pattern = Maybe(Digit());
Regex syntax: \d?
MaybeMany
method returns a pattern that matches previous element zero or more times.
var pattern = Digit().MaybeMany();
or
var pattern = MaybeMany(Digit());
Regex syntax: \d*
OneMany
method returns a pattern that matches previous element one or more times.
var pattern = Digit().OneMany();
or
var pattern = OneMany(Digit());
Regex syntax: \d+
Quantifiers are "greedy" by default which means that previous element is matched as many times as possible. If you want to match previous element as few times as possible, use Lazy
method.
Following pattern will match any character zero or more times but as few times as possible.
var pattern = Any().MaybeMany().Lazy();
Regex syntax: [\s\S]*?
Previous pattern is quite common so it is wrapped into a Crawl
method.
var pattern = Patterns.Crawl();
In regular expressions syntax you can apply quantifier only after the element that should be quantified. In LINQ to Regex you can define a quantifier group and put a quantified content into it.
The +
operator concatenates the operands into a new pattern. Following pattern matches an empty line.
var pattern = Patterns.BeginLine().Assert(Patterns.NewLine());
Regex syntax: (?m:^)(?=(?:\r?\n))
Same goal can be achieved using +
operator.
var pattern = Patterns.BeginLine() + Patterns.Assert(Patterns.NewLine());
With "using static" statement the expression is more concise.
var pattern = BeginLine() + Assert(NewLine());
-
operator can be used to create character subtraction. This operator is defined for CharGroup
, CharGrouping
and CharPattern
types. Except
method is used to create character subtraction. Following pattern matches a white-space character except a carriage return and a linefeed.
var pattern = Patterns.WhiteSpace().Except(Chars.CarriageReturn().Linefeed());
Regex syntax: [\s-[\r\n]]
Same goal can be achieved using -
operator.
var pattern = Patterns.WhiteSpace() - Chars.CarriageReturn().Linefeed();
Previous pattern is quite common so it is wrapped into a WhiteSpaceExceptNewLine
method.
var pattern = Patterns.WhiteSpaceExceptNewLine();
Any
method represents a group in which any one of the specified patterns has to be matched. Following pattern matches a word that begin with a or b.
var pattern = Any(
WordBoundary() + "a" + WordChars(),
WordBoundary() + "b" + WordChars());
Regex syntax: (?:\ba\w+|\bb\w+)
Same goal can be achieved using |
operator
var pattern = (WordBoundary() + "a" + WordChars()) | (WordBoundary() + "b" + WordChars());
!
operator is used to create pattern that has opposite meaning than operand. Following pattern represents a linefeed that is not preceded with a carriage return and can be used to normalize line endings to Windows mode.
var pattern = Patterns.NotAssertBack(CarriageReturn()).Linefeed();
Regex syntax: (?:(?<!\r)\n)
Same goal can be achieved using ! operator.
var pattern = !Patterns.AssertBack(CarriageReturn()) + Patterns.Linefeed();
With "using static" statement the expression is more concise.
var pattern = !AssertBack(CarriageReturn()) + Linefeed();
"While" is an alias for a *
quantifier. Methods whose name begins with "While" returns pattern that matches a specified character zero or more times.
var pattern = WhileChar('a');
Regex syntax: a*
var pattern = WhileDigit();
Regex syntax: \d*
var pattern = WhileWhiteSpace();
Regex syntax: \s*
var pattern = WhileWhiteSpaceExceptNewLine();
Regex syntax: [\s-[\r\n]]*
var pattern = WhileWordChar();
Regex syntax: \w*
Methods whose name begins with "WhileNot" returns pattern that matches a character that is not a specified character zero or more times.
var pattern = WhileNotChar('a');
Regex syntax: [^a]*
var pattern = WhileNotNewLineChar();
Regex syntax: [^\r\n]*
Methods whose name begins with "Until" returns pattern that matches a character that is not a specified character zero or more times terminated with the specified character.
var pattern = UntilChar('a');
Regex syntax: (?:[^a]*a)
var pattern = UntilNewLine();
Regex syntax: (?:[^\n]*\n)
Methods whose name ends with "Native" returns pattern that behaves differently depending on the provided RegexOptions
.
In the follwoing two patterns, a dot can match any character except linefeed or any character if RegexOptions.Singleline
option is applied.
var pattern = AnyNative();
Regex syntax: .
var pattern = CrawlNative();
Regex syntax: .*?
Static method Patterns.Concat
concatenates elements of the specified collection.
var pattern = Concat("a", "b", "c", "d");
Regex syntax: abcd
Static method Patterns.Join
concatenates the elements of the specified collection using the specified separator between each element. It is very similar to a string.Join
method.
var pattern = Join(WhiteSpaces(), "a", "b", "c", "d");
Regex syntax: a\s+b\s+c\s+d
In following examples, an output is obtained using following syntax:
Console.WriteLine(pattern.ToString(PatternOptions.FormatAndComment));
var pattern = BeginLine().WhiteSpaceExceptNewLine().OneMany());
Regex syntax:
(?m: # group options
^ # beginning of input or line
) # group end
[\s-[\r\n]]+ # character group one or more times
var pattern =
Group(Word())
.NotWordChars()
.GroupReference(1)
.WordBoundary();
Regex syntax:
( # numbered group
(?: # noncapturing group
\b # word boundary
\w+ # word character one or more times
\b # word boundary
) # group end
) # group end
\W+ # non-word character one or more times
\1 # group reference
\b # word boundary
string q = "\"";
var pattern = "@" + q + WhileNotChar(q) + MaybeMany(q + q + WhileNotChar(q)) + q;
Regex syntax:
@" # text
[^"]* # negative character group zero or more times
(?: # noncapturing group
"" # text
[^"]* # negative character group zero or more times
)* # group zero or more times
" # quote mark
var pattern =
WordBoundary()
.CountFrom(3,
Any(values.Select(f => Group(Patterns.Text(f))))
.WordBoundary()
.NotWordChar().MaybeMany().Lazy())
.GroupReference(1)
.GroupReference(2)
.GroupReference(3);
Regex syntax:
\b # word boundary
(?: # noncapturing group
(?: # noncapturing group
( # numbered group
one # text
) # group end
| # or
( # numbered group
two # text
) # group end
| # or
( # numbered group
three # text
) # group end
) # group end
\b # word boundary
\W*? # non-word character zero or more times but as few times as possible
){3,} # group at least n times
\1 # group reference
\2 # group reference
\3 # group reference
var pattern =
"<![CDATA["
+ WhileNotChar(']')
+ MaybeMany(
']'
+ NotAssert("]>")
+ WhileNotChar(']'))
+ "]]>";
Regex syntax:
<! # text
\[ # left square bracket
CDATA # text
\[ # left square bracket
[^\]]* # negative character group zero or more times
(?: # noncapturing group
] # right square bracket
(?! # negative lookahead assertion
]> # text
) # group end
[^\]]* # negative character group zero or more times
)* # group zero or more times
]]> # text
If you are looking for a desktop IDE for .NET regular expressions, try Regexator.