microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
101.25k stars 12.52k forks source link

Regular Expression finds #58287

Open RyanCavanaugh opened 7 months ago

RyanCavanaugh commented 7 months ago

Acknowledgement

Comment

Note: I eventually gave up on capturing "Not available unless target is ESXXXX" errors since they're not really interesting to look at

Via #58275

This character cannot be escaped in a regular expression.

const image_path_escape = image_path.replace(/\o/g, '/o') //escape string "\o" in "\output"

Named capturing groups are only available when targeting 'ES2018' or later

/^((?<negative>-)|\+)?P((?<years>\d*)Y)?((?<months>\d*)M)?((?<weeks>\d*)W)?((?<days>\d*)D)?((?<time>T)((?<hours>\d*[.,]?\d{1,9})H)?((?<minutes>\d*[.,]?\d{1,9})M)?((?<seconds>\d*[.,]?\d{1,9})S)?)?$/

Named capturing groups are only available when targeting 'ES2018' or later.

const IMPORT_REGEX = /(?<key>import|export)\s+(?:(?<alias>[\w,{}\s*]+)\s+from)?\s*(?:(?<quote>["'])?(?<ref>[@\w\s\\/.-]+)\3?)\s*(?<term>[;\n])/g

Named capturing groups are only available when targeting 'ES2018' or later

const match = text.match(/^(?<description>(.|\n)*)```(?<language>[^\n]+)\n(?<code>(.|\n)+)\n```$/m);

This regular expression flag is only available when targeting 'es2018' or later

return fileContent.replace(/<!--.*?-->/gs, '');

This character cannot be escaped in a regular expression

const fixedId = listItem.id.replace(/\_/g, "/").replace(/\-/g, "+");

Named capturing groups are only available when targeting 'ES2018' or later

const INPUT_EXTENSION_IMPORT_REGEX = /\.(svelte|(lite(\.tsx|\.jsx)?))(?<quote>['"])/g;

Octal escape sequences are not allowed. Use the syntax '\x04'

const propsRegex = /props\s*\.\s*([a-zA-Z0-9_\4]+)\(/;

Named capturing groups are only available when targeting 'ES2018' or later

private static SSH_PATH_RE = new RegExp(
    [
        /^\s*/,
        /(?:(?<proto>[a-z]+):\/\/)?/,
        /(?:(?<user>[a-z_][a-z0-9_-]+)@)?/,
        /(?<domain>[^\s\/\?#:]+)/,
        /(?::(?<port>[0-9]{1,5}))?/,
        /(?:[\/:](?<owner>[^\s\/\?#:]+))?/,
        /(?:[\/:](?<repo>(?:[^\s\?#:.]|\.(?!git\/?\s*$))+))/,
        /(?:.git)?\/?\s*$/,
    ]

Named capturing groups are only available when targeting 'ES2018' or later

const regexp = /\[(?<link>http:\/\/[^\]]+)\]/g

A character class range must not be bounded by another character class

this.relocDataSymNameRe = /^(?<symname>[^\d-+][\w.]*)?\s*(?<addend_or_value>.*)$/;

filepath.replace(/^C:\/Users\/[\w\d-.]*\/AppData\/Local\/Temp\/compiler-explorer-compiler[\w\d-.]*\//, '/app/')

const ATFILELINE_RE = /\s*at ([\w-/.]+):(\d+)/;

const selectedPassRe = /[0-9]*(i|t|r)\.([\w-_]*)/;

Octal escape sequences are not allowed. Use the syntax '\x02'.

const shellChars = /[\002-\011\013-\032\\#?`(){}[\]^*<=>~|; "!$&'\202-\377]/;

This character cannot be escaped in a regular expression.

private readonly nameWithOwner = /(?<owner>-?[a-z0-9][a-z0-9\-\_]*)\/(?<name>(?:\w|\.|\-)+)/;

const isURlCustomFormat = /\.[a-z]+\z/.test(anchor.href);

Octal escape sequences are not allowed. Use the syntax '\x02'

const regexp = /([^\s'"]+(['"])([^\2]*?)\2)|[^\s'"]+|(['"])([^\4]*?)\4/gi;

A character class range must not be bounded by another character class

// Source: https://stackoverflow.com/a/8234912/2013580
const urlRegExp = new RegExp(
  /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=+$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=+$,\w]+@)[A-Za-z0-9.-]+)((?:\/[+~%/.\w-_]*)?\??(?:[-+=&;%@.\w_]*)#?(?:[\w]*))?)/,
);

This regular expression flag is only available when targeting 'es2022' or later

// this regex is different from HASHTAG_REGEX in that it does not look for a
// #+character. It uses a negative look-ahead for `# `
const HASH_REGEX =
  /(?<=^|\s)#(?![ \t#])([0-9]*[\p{L}\p{Emoji_Presentation}\p{N}/_-]*)/dgu;

This regular expression flag is only available when targeting 'es2018' or later

return message.replace(/([{}](?:.*[{}])?)/su, `'$1'`)

This regular expression flag is only available when targeting 'es6' or later

return message.replace(/([{}](?:.*[{}])?)/su, `'$1'`)

Octal escape sequences are not allowed. Use the syntax '\x00'

// Since negative lookbehind isn't supported in all browsers, this leaves out the negative lookbehind condition `(?<!\.lock)` to ensure the branch name doesn't end with `.lock`
const validBranchOrTagRegex = /^[^/](?!.*\/\.)(?!.*\.\.)(?!.*\/\/)(?!.*@\{)[^\000-\037\177 ~^:?*[\\]+[^./]$/;

// Since negative lookbehind isn't supported in all browsers, leave out the negative lookbehind condition `(?<!\.lock)` to ensure the branch name doesn't end with `.lock`
const refRegexShared = /\b((?!.*\/\.)(?!.*\.\.)(?!.*\/\/)(?!.*@\{)[^\000-\037\177 ,~^:?*[\\]+[^ ./])\b/gi;

This regular expression flag is only available when targeting 'es2018' or later

if (!/(\{\{.+?\}\})|(\{#.+?#\})|(\{%.+?%\})/s.test(str)) {

A character class range must not be bounded by another character class

const rUrl = /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=+$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=+$,\w]+@)[A-Za-z0-9.-]+)((?:\/[+~%/.\w-_]*)?\??(?:[-+=&;%@.\w_]*)#?(?:[.!/\\w]*))?)/;

This character cannot be escaped in a regular expression

    expect(data['message']).toMatch(
      /Malformed FormData request. \_*Response.formData: Could not parse content as FormData./
    )

This regular expression flag is only available when targeting 'es6' or later

const validBundleID = /^([a-zA-Z]([a-zA-Z0-9_])*\.)+[a-zA-Z]([a-zA-Z0-9_])*$/u

There is nothing available for repetition

const regExp: RegExp = /const foo *= *{0x1: *'bar'};/;

[ '}' expected]()

/tag`foo *\${0x1 *\+ *0x1} *bar`;/

A character class range must not be bounded by another character class

if (!/^([\w-.]*)$/.test(name)) {

A character class range must not be bounded by another character class.

return str.replace(/^(\w)|[\s-_:]+(\w)/g, function (match, p1, p2) {

A character class range must not be bounded by another character class

const urlGithubRE = /^(?:https:\/\/(?:github\.com|api\.github\.com\/repos)|(?:\/)?(?:\/)?repos)([\w-.?!=&%*+:@\/]*)/g;

This character cannot be escaped in a regular expression.

const H_REGEX = /(?<tag>[\w\-]+)?(?:#(?<id>[\w\-]+))?(?<class>(?:\.(?:[\w\-]+))*)(?:@(?<name>(?:[\w\_])+))?/;

This regular expression flag is only available when targeting 'es2022' or later.

const markRegex = /\bMARK:\s*(.*)$/d;

Octal escape sequences are not allowed. Use the syntax '\x09'.

function cssEscape(str: string): string {
    return str.replace(/[\11\12\14\15\40]/g, '/'); // HTML class names can not contain certain whitespace characters, use / instead, which doesn't exist in file names.
}

A character class range must not be bounded by another character class.

const fileRegex = /(file:\/\/)?([a-zA-Z]:(\\\\|\\|\/)|(\\\\|\\|\/))?([\w-\._]+(\\\\|\\|\/))+[\w-\._]*/g;

A character class range must not be bounded by another character class.

/^\w([\w-.]*\w)?$/.test(x.preferredUsername)

Named capturing groups are only available when targeting 'ES2018' or later

const deprecation = (propDescriptor.description || '').match(/@deprecated(\s+(?<info>.*))?/);

A character class range must not be bounded by another character class.

let isText = /^[\w-\s.,\t\n]+$/.test(detail)

This character cannot be escaped in a regular expression

return tag.match(/^(?![\.\-])([a-zA-Z0-9\_\.\-])+$/g);

A character class range must not be bounded by another character class

const urlRegex = () =>
  /((?:https?(?::\/\/))(?:www\.)?(?:[a-zA-Z\d-_.]+(?:(?:\.|@)[a-zA-Z\d]{2,})|localhost)(?:(?:[-a-zA-Z\d:%_+.~#!?&//=@]*)(?:[,](?![\s]))*)*)/g;

A character class range must not be bounded by another character class

export function expandDefaultServerVariables(url: string, variables: object = {}) {
  return url.replace(
    /(?:{)([\w-.]+)(?:})/g,
    (match, name) => (variables[name] && variables[name].default) || match,
  );
}

dozens of these in this file, see https://github.com/microsoft/TypeScript/issues/58275#issuecomment-2068174097

A decimal escape must refer to an existent capturing group. There are only 1 capturing groups in this regular expression

/([^a-zA-Z0-9\s{(\[<])(?:(?!\2)[^\\]|\\[\s\S])*\2(?:(?!\2)[^\\]|\\[\s\S])*\2/

A character class range must not be bounded by another character class

    // eslint-disable-next-line @typescript-eslint/prefer-regexp-exec
    const githubMatch = location.match(/https:\/\/github.com\/([\w-_]+\/[\w-_]+)/i);

Unicode property value expressions are only available when the Unicode (u) flag or the Unicode Sets (v) flag is set

    slug: ['', unicodePatternValidator(/^[\p{Letter}0-9._-]+$/)],

A character class range must not be bounded by another character class

export const wordPattern = /(#?-?\d*\.\d\w*%?)|([$@#!.:]?[\w-?]+%?)|[$@#!.]/g;

return stream.advanceIfRegExp(/^[_:\w][_:\w-.\d]*/).toLowerCase();
rbuckton commented 7 months ago

I expect all of the errors not related to --target are a result of regular expressions that are allowed per Annex B.

rbuckton commented 7 months ago

IMO, all of the "Octal escape sequences are not allowed" and "A decimal escape must refer to an existent capturing group" are probably indications of actual errors in user code. They're allowed in Annex B, but the user likely intended to use them as a backreference to a capture group and that's not how Annex B would treat them.

All of the "A character class range must not be bounded by another character class" errors are probably fine and shouldn't be reported. Annex B allows them and most users wrote something like [\w-.] or the like thinking it meant "word characters, -, and .", which is how Annex B treats them.

graphemecluster commented 7 months ago

Ah, I did once mentioned this on my PR and thought it was fine since Ryan reacted on my comment. https://github.com/microsoft/TypeScript/pull/55600#issuecomment-1735102411 I am fine with weakening the grammar, however keep in mind that we can’t guarantee everything runs on engines with Annex B support though I understand that this is mostly the case. IMHO another compiler option is the only realistic way to solve this, unfortunately.

graphemecluster commented 7 months ago

IMO, all of the "Octal escape sequences are not allowed" and "A decimal escape must refer to an existent capturing group" are probably indications of actual errors in user code.

Yes, I actually thought that there is a consensus on not allowing any octal escapes anywhere per #53198 😅

nostalic commented 6 months ago

Great work on adding validation for regexp!

We came across another regression on 5.5 for character class escape with script extensions that I did not see listed abover:

const regexpNonLatin = /\P{Script_Extensions=Latin}+/gu;

The issue seems specific to Script_Extensions and scx - Script is working fine. Same behavior is observed for \p and \P.

"٢".match(/\p{Script=Thaana}/u); // OK on 5.5
"٢".match(/\p{Script_Extensions=Thaana}/u); // KO on 5.5

// @ts-ignore can be used to work around the error, as hinted on https://github.com/microsoft/TypeScript/pull/58295

Those regexps are part of the samples on https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape ; we use something similar in our codebase and faced this when pretesting our typescript upgrade.

Would it be possible to support script extension values in 5.5?

Related links:

graphemecluster commented 6 months ago

@nostalic OMG, that’s totally my fault, I am very bad. I made it empty because the Script_Extensions section in PropertyValueAliases.txt shows nothing, without thinking much. However, I don’t think the Team will have time to review PRs related to regular expressions in the immediate future; they even haven’t reviewed my short follow-up PRs yet 😅

nostalic commented 6 months ago

@graphemecluster This is a great improvement, and the regex validation helps to catch some issues we had, so thanks for implementing it!

The issue can be worked around and as such this is not a blocker for us, though it would be great to have it fixed in 5.5 🙂

jakebailey commented 6 months ago

@nostalic OMG, that’s totally my fault, I am very bad. I made it empty because the Script_Extensions section in PropertyValueAliases.txt shows nothing, without thinking much. However, I don’t think the Team will have time to review PRs related to regular expressions in the immediate future; they even haven’t reviewed my short follow-up PRs yet 😅

Please do send things if you have them; I do think we want to get things looked at before 5.5 is branched off.

jonnytest1 commented 5 months ago

thoughts on this: since we already do regex group checking (as per release notes) shouldnt the resulting matchgroups be typed ? image

(tried on playground with 5.5-beta)

jakebailey commented 5 months ago

No, the type system does not special case regexes like this. (yet?)

graphemecluster commented 5 months ago

Enabling further implementation of regex type checking is the most vital reason why I implemented regex syntax checking, and it’s gonna be the most exciting part 😆