Open isaacs opened 2 months ago
+ repro-exists
:
└─$ node
Welcome to Node.js v22.3.0.
Type ".help" for more information.
> util.stripVTControlCharacters('\x1b]8;;http://example.com\x1b\\This is a link\x1b]8;;\x1b\\ hello')
ttp://example.comThis is a link;; hello
The current RegEx used is:
/[\u001B\u009B][[\]()#;?]*(?:(?:(?:(?:;[-a-zA-Z\d\/#&.:=?%@~_]+)*|[a-zA-Z\d]+(?:;[-a-zA-Z\d\/#&.:=?%@~_]*)*)?\u0007)|(?:(?:\d{1,4}(?:;\d{0,4})*)?[\dA-PR-TZcf-ntqry=><~]))/g
I'm currently building and testing @isaacs suggested regex.
Oh, hey, looks like you can put other parameters between the ;
in OCS 8 codes, so this regexp would probably be a bit more complete:
/\u001b\]8;[^;]*?;(.*?)(?:\u001b\\|\u0007)(.*?)\u001b]8;[^;]*?;(?:\u001b\\|\u0007)/g
Note the addition of [^;]*?
between the ;;
chars.
(I don't know of any terminal emulators or programs that do put parameters there, as it's just "reserved for future use", but that's what the spec says it can be.)
I was wrong, the id
parameter is used in some cases to control link hover behavior, so yes, would need to strip that.
I'm seeing a number of failures with your RegEx, but I do agree the current one may have a bug. (I would CC a util team but there isn't one :/)
The regex at https://github.com/chalk/ansi-regex/blob/main/index.js supports links. You could maybe find some inspiration there.
Hi @sindresorhus,
IIRC, the regex used is from ansi-regex, but an older version.
Later today, I'll test if a newer version of the regex fixes the issue.
@sindresorhus
// From https://github.com/chalk/strip-ansi/blob/main/index.js
function ansiRegex({onlyFirst = false} = {}) {
const pattern = [
'[\\u001B\\u009B][[\\]()#;?]*(?:(?:(?:(?:;[-a-zA-Z\\d\\/#&.:=?%@~_]+)*|[a-zA-Z\\d]+(?:;[-a-zA-Z\\d\\/#&.:=?%@~_]*)*)?\\u0007)',
'(?:(?:\\d{1,4}(?:;\\d{0,4})*)?[\\dA-PR-TZcf-nq-uy=><~]))'
].join('|');
return new RegExp(pattern, onlyFirst ? undefined : 'g');
}
console.log('\x1b]8;;http://example.com\x1b\\This is a link\x1b]8;;\x1b\\ hello'.replace(ansiRegex(), ''));
Outputs:
"ttp://example.com\u001b\\This is a link;;\u001b\\ hello"
https://runkit.com/6689b91eebd0a700080cd8b3/668c3336e85de300088f3f39
I've opened an issue in https://github.com/chalk/ansi-regex/issues/56, which is the source for the RegEx used by Node.js. Once resolved, I'll update the RegEx.
Version
22.4.0
Platform
Subsystem
util
What steps will reproduce the bug?
How often does it reproduce? Is there a required condition?
Always
What is the expected behavior? Why is that the expected behavior?
Should output:
(With "This is a link" not hyperlinked.)
What do you see instead?
Additional information
OCS codes are a bit tricky to capture and strip, but this is what I'm using in ansi-to-pre:
Adding
str.replaceAll(/\u001b\]8;;(.*?)(?:\u001b\\|\u0007)(.*?)\u001b]8;;(?:\u001b\\|\u0007)/g, '$2')
should do the right thing, but not sure if there's some better way to do it in context.