llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.05k stars 11.98k forks source link

FileCheck regex should support common regex escape sequences #78066

Open cjdb opened 10 months ago

cjdb commented 10 months ago

clang\d is a pattern that's recognised by many regex engines to mean clang[0-9], but FileCheck doesn't seem to recognise it. It would be good to have FileCheck recognise the following patterns:

The above are good for matching ASCII characters, but don't scale for anything that's outside of ASCII. If we're to add this feature, I think it would be good to produce a design that incorporates Unicode code points as well.

asl commented 10 months ago

The regex implementation available in lib/Support seems to support only POSIX-style regex'es. So, one could use [:digit:] instead of \d

Here is list of supported classes: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/regcomp.c#L58

cjdb commented 9 months ago

[:digit:] is longer than [0-9] which already interrupts readability when compared with \d. Further, these types of escapes are accepted by a large variety of regex engines, and it was surprising to learn that FileCheck doesn't support this (I spent a couple of hours debugging before swapping out \d with [0-9]).

If POSIX regex doesn't support this, then we should consider expanding to a style that supports both [:digit:] and \d.