microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
99.84k stars 12.36k forks source link

Suggestion: Regex-validated string type #6579

Closed tenry92 closed 3 years ago

tenry92 commented 8 years ago

There are cases, where a property can not just be any string (or a set of strings), but needs to match a pattern.

let fontStyle: 'normal' | 'italic' = 'normal'; // already available in master
let fontColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i = '#000'; // my suggestion

It's common practice in JavaScript to store color values in css notation, such as in the css style reflection of DOM nodes or various 3rd party libraries.

What do you think?

DanielRosenwasser commented 8 years ago

Yeah, I've seen this combing through DefinitelyTyped, . Even we could use something like this with ScriptElementKind in the services layer, where we'd ideally be able to describe these as a comma-separated list of specific strings.

The main problems are:

zackarychapple commented 8 years ago

Huge +1 on this, ZipCode, SSN, ONet, many other use cases for this.

radziksh commented 8 years ago

I faced the same problem, and I see that it is not implemented yet, maybe this workaround will be helpful: http://stackoverflow.com/questions/37144672/guid-uuid-type-in-typescript

rylphs commented 8 years ago

As @mhegazy suggested I will put my sugggestion (#8665) here. What about allow simple validation functions in type declarations? Something like that:

type Integer(n:number) => String(n).macth(/^[0-9]+$/)
let x:Integer = 3 //OK
let y:Integer = 3.6 //wrong

type ColorLevel(n:number) => n>0 && n<= 255
type RGB = {red:ColorLevel, green:ColorLevel, blue:ColorLevel};
let redColor:RGB = {red:255, green:0, blue:0}   //OK
let wrongColor:RGB = {red:255, green:900, blue:0} //wrong

type Hex(n:string) => n.match(/^([0-9]|[A-F])+$/)
let hexValue:Hex = "F6A5" //OK
let wrongHexValue:Hex = "F6AZ5" //wrong

The value that the type can accept would be determined by the function parameter type and by the function evaluation itself. That would solve #7982 also.

ozyman42 commented 8 years ago

@rylphs +1 this would make TypeScript extremely powerful

maiermic commented 7 years ago

How does subtyping work with regex-validated string types?

let a: RegExType_1
let b: RegExType_2

a = b // Is this allowed? Is RegExType_2 subtype of RegExType_1?
b = a // Is this allowed? Is RegExType_1 subtype of RegExType_2?

where RegExType_1 and RegExType_2 are regex-validated string types.

Edit: It looks like this problem is solvable in polynomial time (see The Inclusion Problem for Regular Expressions).

basarat commented 7 years ago

Would also help with TypeStyle : https://github.com/typestyle/typestyle/issues/5 :rose:

DanielRosenwasser commented 7 years ago

In JSX, @RyanCavanaugh and I've seen people add aria- (and potentially data-) attributes. Someone actually added a string index signature in DefinitelyTyped as a catch-all. A new index signature for this would have be helpful.

interface IntrinsicElements {
    // ....
    [attributeName: /aria-\w+/]: number | string | boolean;
}
Igmat commented 7 years ago

Design Proposal

There are a lot of cases when developers need more specified value then just a string, but can't enumerate them as union of simple string literals e.g. css colors, emails, phone numbers, ZipCode, swagger extensions etc. Even json schema specification which commonly used for describing schema of JSON object has pattern and patternProperties that in terms of TS type system could be called regex-validated string type and regex-validated string type of index.

Goals

Provide developers with type system that is one step closer to JSON Schema, that commonly used by them and also prevent them from forgetting about string validation checks when needed.

Syntactic overview

Implementation of this feature consists of 4 parts:

Regex validated type

type CssColor = /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
type Email = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i;
type Gmail = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;

Regex-validated variable type

let fontColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;

and the same, but more readable

let fontColor: CssColor;

Regex-validated variable type of index

interface UsersCollection {
    [email: /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i]: User;
}

and the same, but more readable

interface UsersCollection {
    [email: Email]: User;
}

Type guard for variable type

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(color)) {
        fontColor = color;// correct
    }
}

and same

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (!(/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(color))) return;
    fontColor = color;// correct
}

and using defined type for better readability

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (CssColor.test(color)) {
        fontColor = color;// correct
    }
}

same as

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (!(CssColor.test(color))) return;
    fontColor = color;// correct
}

Type gurard for index type

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(email)) {
        collection[email];// type is User
    }
}

same as

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (!(/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(email))) return;
    collection[email];// type is User
}

and using defined type for better readability

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (Email.test(email)) {
        collection[email];// type is User
    }
}

same as

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (!(Email.test(email))) return;
    collection[email];// type is User
}

Semantic overview

Assignments

let email: Email;
let gmail: Gmail;
email = 'test@example.com';// correct
email = 'test@gmail.com';// correct
gmail = 'test@example.com';// compile time error
gmail = 'test@gmail.com';// correct
gmail = email;// obviously compile time error
email = gmail;// unfortunately compile time error too

Unfortunately we can't check is one regex is subtype of another without hard performance impact due to this article. So it should be restricted. But there are next workarounds:

// explicit cast
gmail = <Gmail>email;// correct
// type guard
if (Gmail.test(email)) {
    gmail = email;// correct
}
// another regex subtype declaration
type Gmail = Email & /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;
gmail = email;// correct

Unfortunately assigning of string variable to regex-validated variable should also be restricted, because there is no guaranty in compile time that it will match regex.

let someEmail = 'test@example.com';
let someGmail = 'test@gmail.com';
email = someEmail;// compile time error
gmail = someGmail;// compile time error

But we are able to use explicit cast or type guards as shown here. Second is recommended.
Luckily it's not a case for string literals, because while using them we ARE able to check that its value matches regex.

let someEmail: 'test@example.com' = 'test@example.com';
let someGmail: 'test@gmail.com' = 'test@gmail.com';
email = someEmail;// correct
gmail = someGmail;// correct

Type narrowing for indexes

For simple cases of regex-validated type of index see Type gurard for index type. But there could be more complicated cases:

type Email = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i;
type Gmail = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;
interface UsersCollection {
    [email: Email]: User;
    [gmail: Gmail]: GmailUser;
}
let collection: UsersCollection;
let someEmail = 'test@example.com';
let someGmail = 'test@gmail.com';
collection['test@example.com'];// type is User
collection['test@gmail.com'];// type is User & GmailUser
collection[someEmail];// unfortunately type is any
collection[someGmail];// unfortunately type is any
// explicit cast is still an unsafe workaround
collection[<Email> someEmail];// type is User
collection[<Gmail> someGmail];// type is GmailUser
collection[<Email & Gmail> someGmail];// type is User & GmailUser

Literals haven't such problem:

let collection: UsersCollection;
let someEmail: 'test@example.com' = 'test@example.com';
let someGmail: 'test@gmail.com' = 'test@gmail.com';
collection[someEmail];// type is User
collection[someGmail];// type is User & GmailUser

But for variables the best option is using type guards as in next more realistic examples:

getUserByEmail(email: string) {
    collection[email];// type is any
    if (Email.test(email)) {
        collection[email];// type is User
        if (Gmail.test(email)) {
            collection[email];// type is User & GmailUser
        }
    }
    if (Gmail.test(email)) {
        collection[email];// type is GmailUser
    }
}

But if we'll use better definition for Gmail type it would have another type narrowing:

type Gmail = Email & /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (Email.test(email)) {
        collection[email];// type is User
        if (Gmail.test(email)) {
            collection[email];// type is User & GmailUser
        }
    }
    if (Gmail.test(email)) {
        collection[email];// type is User & GmailUser
    }
}

Unions and intersections

Actually common types and regex-validated types are really different, so we need rules how correclty handle their unions and intersections.

type Regex_1 = / ... /;
type Regex_2 = / ... /;
type NonRegex = { ... };
type test_1 = Regex_1 | Regex_2;// correct
type test_2 = Regex_1 & Regex_2;// correct
type test_3 = Regex_1 | NonRegex;// correct
type test_4 = Regex_1 & NonRegex;// compile time error
if (test_1.test(something)) {
    something;// type is test_1
    // something matches Regex_1 OR Regex_2
}
if (test_2.test(something)) {
    something;// type is test_2
    // something matches Regex_1 AND Regex_2
}
if (test_3.test(something)) {
    something;// type is Regex_1
} else {
    something;// type is NonRegex
}

Generics

There are no special cases for generics, so regex-validated type could be used with generics in same way as usual types. For generics with constraints like below, regex-validated type behaves like string:

class Something<T extends String> { ... }
let something = new Something<Email>();// correct

Emit overview

Unlike usual types, regex-validated have some impact on emit:

type Regex_1 = / ... /;
type Regex_2 = / ... /;
type NonRegex = { ... };
type test_1 = Regex_1 | Regex_2;
type test_2 = Regex_1 & Regex_2;
type test_3 = Regex_1 | NonRegex;
type test_4 = Regex_1 & NonRegex;
if (test_1.test(something)) {
    /* ... */
}
if (test_2.test(something)) {
    /* ... */
}
if (test_3.test(something)) {
    /* ... */
} else {
    /* ... */
}

will compile to:

var Regex_1 = / ... /;
var Regex_2 = / ... /;
if (Regex_1.test(something) || Regex_2.test(something)) {
    /* ... */
}
if (Regex_1.test(something) && Regex_2.test(something)) {
    /* ... */
}
if (Regex_1.test(something)) {
    /* ... */
} else {
    /* ... */
}

Compatibility overview

This feature has no issues with compatibility, because there only case that could break it and it is related to that regex-validated type has emit impact unlike usual type, so this is valid TS code:

type someType = { ... };
var someType = { ... };

when code below is not:

type someRegex = / ... /;
var someRegex = { ... };

But second already WAS invalid, but due to another reason (type declaration was wrong). So now we have to restrict declaring of variable with name same to type, in case when this type is regex-validated.

P.S.

Feel free to point on things that I probably have missed. If you like this proposal, I could try to create tests that covers it and add them as PR.

Igmat commented 7 years ago

I've forgotten to point to some cases for intersections and unions of regex-validated types, but I've described them in latest test case. Should I update Design proposal to reflect that minor change?

alexanderbird commented 7 years ago

@Igmat, question about your design proposal: Could you elaborate on the emit overview? Why would regex-validated types need to be emitted? As far as I can tell, other types don't support runtime checks... am I missing something?

Igmat commented 7 years ago

@alexanderbird, yes, any other type have no impact on emit. At first, I thought that regex-validated will do so as well, so I've started creating the proposal and playing with proposed syntax. First approach was like this:

let fontColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
fontColor = "#000";

and this:

type CssColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
let fontColor: CssColor;
fontColor = "#000";

It's ok and has no need for emit changes, because "#000" could be checked in compile time. But we also have to handle narrowing from string to regex-validated type in order to make it useful. So I've thought about this for both previous setups:

let someString: string;
if (/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(someString)) {
    fontColor = someString; // Ok
}
fontColor = someString; // compile time error

So it also has no impact on emit and looks ok, except that regex isn't very readable and have to be copied in all places, so user could easily make a mistake. But in this particular case it still seems to be better than changing how type works. But then I realized that this stuff:

let someString: string;
let email: /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/I;
if (/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(someString)) {
    email = someString; // Ok
}
email = someString; // compile time error

is a nightmare. And it's even without intersections and unions. So to avoid happening of stuff like this, we have to slightly change type emit as shown in proposal.

Igmat commented 7 years ago

@DanielRosenwasser, could you, please, provide some feedback for this proposal? And also for tests referenced here, if possible? I really want to help with implementing of this feature, but it requires a lot of time (tsc is really complicated project and I still have to work on understanding of how it works inside) and I don't know is this proposal is ready to implement or you will reject this feature implemented in this way due to another language design vision or any other reason.

DanielRosenwasser commented 7 years ago

Hey @Igmat, I think there are a few things I should have initially asked about

To start, I still don't understand why you need any sort of change to emit, and I don't think any sort of emit based on types would be acceptable. Check out our non-goals here.

Another issue I should have brought up is the problem of regular expressions that use backreferences. My understanding (and experience) is that backreferences in a regular expression can force a test to run in time exponential to its input. Is this a corner case? Perhaps, but it's something I'd prefer to avoid in general. This is especially important given that in editor scenarios, a type-check at a location should take a minimal amount of time.

Another issue is that we'd need to either rely on the engine that the TypeScript compiler runs on, or build a custom regular expression engine to execute these things. For instance, TC39 is moving to include a new s flag so that . can match newlines. There would be a discrepancy between ESXXXX and older runtimes that support this.

alexanderbird commented 7 years ago

@igmat - there is no question in my mind that having regexes emitted at runtime would be useful. However, I don't think they're necessary for this feature to be useful (and from the sounds of what @DanielRosenwasser has said, it probably wouldn't get approved anyway). You said

But we also have to handle narrowing from string to regex-validated type in order to make it useful

I think this is only the case if we are to narrow from a dynamic string to a regex-validated type. This gets very complicated. Even in this simple case:

function foo(bar: number) {
    let baz: /prefix:\d+/ = 'prefix:' + number;
}

We can't be sure that the types will match - what if the number is negative? And as the regexes get more complicated, it just gets messier and messier. If we really wanted this, maybe we allow "type interpolation: type Baz = /prefix:{number}/... but I don't know if it's worth going there.

Instead, we could get partway to the goal if we only allowed string literals to be assigned to regex-validated types.

Consider the following:

type Color = /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
let foo: Color = '#000000';
let bar: Color = '#0000'; // Error - string literal '#0000' is not assignable to type 'Color'; '#0000' does not match /^#([0-9a-f]{3}|[0-9a-f]{6})$/i
let baz: Color = '#' + config.userColorChoice; // Error - type 'string' is not assignable to type 'regex-validated-string'

Do you think that's a workable alternative?

Igmat commented 7 years ago

@DanielRosenwasser, I've read Design Goals carefully and, if I understand you correctly, problem is violation of Non-goals#5. But it doesn't seem to me as violation, but as syntax improvement. For example, previously we had:

const emailRegex = /.../;
/**
 * assign it only with values tested to emailRegex 
 */
let email: string;
let userInput: string;
// somehow get user input
if (emailRegex.test(userInput)) {
    email = userInput;
} else {
    console.log('User provided invalid email. Showing validation error');
    // Some code for validation error
}

With this proposal implemented it would look like:

type Email = /.../;
let email: Email;
let userInput: string;
// somehow get user input
if (Email.test(userInput)) {
    email = userInput;
} else {
    console.log('User provided invalid email. Showing validation error');
    // Some code for validation error
}

As you see, code is almost the same - it's a common simple usage of regex. But second case is much more expressive and will prevent user from accidental mistake, like forgetting to check string before assignment it to variable that meant to be regex-validated. Second thing is that without such type narrowing we won't be able to normally use regex-validated type in indexes, because in most cases such index fields works with some variable that can't be checked in runtime as it could be done with literals.

Igmat commented 7 years ago

@alexanderbird, I don't suggest making this code valid or add some hidden checks in both runtime and compile time.

function foo(bar: number) {
    let baz: /prefix:\d+/ = 'prefix:' + number;
}

This code have to throw error due to my proposal. But this:

function foo(bar: number) {
    let baz: /prefix:\d+/ = ('prefix:' + number) as /prefix:\d+/;
}

or this:

function foo(bar: number) {
    let baz: /prefix:\d+/;
    let possibleBaz: string = 'prefix:' + number;
    if (/prefix:\d+/.test(possibleBaz)) {
        baz = possibleBaz;
    }
}

would be correct, and even have no impact to emitted code.

And as I showed in previous comment, literals would be definitely not enough even for common use cases, because we often have to work with stings from user input or other sources. Without implementing of this emit impact, users would have to work with this type in next way:

export type Email = /.../;
export const Email = /.../;
let email: Email;
let userInput: string;
// somehow get user input
if (Email.test(userInput)) {
    email = <Email>userInput;
} else {
    console.log('User provided invalid email. Showing validation error');
    // Some code for validation error
}

or for intersections:

export type Email = /email-regex/;
export const Email = /email-regex/;
export type Gmail = Email & /gmail-regex/;
export const Gmail = {
    test: (input: string) => Email.test(input) && /gmail-regex/.test(input)
};
let gmail: Gmail;
let userInput: string;
// somehow get user input
if (Gmail.test(userInput)) {
    gmail = <Gmail>userInput;
} else {
    console.log('User provided invalid gmail. Showing validation error');
    // Some code for validation error
}

I don't think that forcing users to duplicate code and to use explicit cast, when it could be easily handled by compiler isn't a good way to go. Emit impact is really very small and predictable, I'm sure that it won't surprise users or lead to some feature misunderstood or hard to locate bugs, while implementing this feature without emit changes definitely WILL.

In conclusion I want to say that in simple terms regex-validated type is both a scoped variable and a compiler type.

Igmat commented 7 years ago

@DanielRosenwasser and @alexanderbird ok, I have one more idea for that. What about syntax like this:

const type Email = /email-regex/;

In this case user have to explicitly define that he/she want this as both type and const, so actual type system has no emit changes unless it used with such modifier. But if it used with it we are still able to avoid a lot of mistakes, casts and duplication of code by adding same emit as for:

const Email = /email-regex/;

This seems to be even bigger than just improvement for this proposal, because this probably could allow something like this (example is from project with Redux):

export type SOME_ACTION = 'SOME_ACTION';
export const SOME_ACTION = 'SOME_ACTION' as SOME_ACTION;

being converted to

export const type SOME_ACTION = 'SOME_ACTION';

I've tried to found some similar suggestion but wasn't successful. If it could be a workaround and if you like such idea, I can prepare Design Proposal and tests for it.

Igmat commented 7 years ago

@DanielRosenwasser, about your second issue - I don't think that it would ever happen, because in my suggestion compiler runs regex only for literals and it doesn't seems that someone will do something like this:

let something: /some-regex-with-backreferences/ = `
long enough string to make regex.test significantly affect performance
`

Anyway we could test how long literal should be for affecting real-time performance and create some heuristic that will warn user if we are unable to check it while he faces this circumstances in some editor scenarios, but we would check it when he will compile the project. Or there could be some other workarounds.

About third question, I'm not sure that understand everything correctly, but it seems that regex engine should be selected depending on target from tsconfig if they have different implementations. Needs some more investigation.

Igmat commented 7 years ago

@DanielRosenwasser are there any thoughts? 😄 About initial proposal and about last one. May be I have to make more detailed overview of second one, do I?

zspitz commented 7 years ago

@Igmat Your proposal limits the validation to only be useful with string types. What are your thoughts on @rylphs proposal? This would allow a more generic validation for all primitive types:

type ColorLevel = (n:number) => n>0 && n<= 255
type RGB = {red:ColorLevel, green:ColorLevel, blue:ColorLevel};
let redColor:RGB = {red:255, green:0, blue:0}   //OK
let wrongColor:RGB = {red:255, green:900, blue:0} //wrong

I suspect however that extending this mechanism beyond primitives to non-primitive types would be too much. One point, the issue that @DanielRosenwasser raised -- about varying regex engine implementations -- would be magnified: depending on the Javascript engine under which the Typescript compiler is running, the validation function might work differently.

Igmat commented 7 years ago

@zspitz it looks promising but in my opinion it could affect compiler performance too much, because function isn't limited by any rules and it will force TS to calculate some expressions that are too complicated or even depend on some resources that doesn't available in compile time.

zspitz commented 7 years ago

@Igmat

because function isn't limited by any rules

Do you have some specific examples in mind? Perhaps it's possible to limit the validation syntax to a "safe"/compile-time-known subset of Typescript.

disjukr commented 7 years ago

how about making user-defined type guards defines new type?

// type guard that introduces new nominal type int
function isInt(value: number): value is type int { return /^\d+$/.test(value.toString()); }
// -------------------------------------^^^^ add type keyword here
function printNum(value: number) { console.log(value); }
function printInt(value: int) { console.log(value); }
const num = 123;
printNum(num); // ok
printInt(num); // error
if (isInt(num)) {
    printNum(num); // ok
    printInt(num); // ok
}
Igmat commented 7 years ago

@disjukr looks nice, but what about type extending?

alexanderbird commented 7 years ago

Do they absolutely need to be extendable? Is there a TypeScript design principle which demands it? If not, I would rather have the nominal types @disjukr suggests than nothing, even though they're not extendable.

We'd need something pretty creative to get extendability IMHO - we can't determine whether one arbitrary type guard (function) is a subset of another arbitrary type guard.

We could get rudimentary "extendability" using a type assertion mentality (I'm not saying this is something "pretty creative" - I'm saying here's a stop-gap until someone comes up with something pretty creative):

function isInt(value: number): value is type int { return /^\d+$/.test(value.toString()); }
// assert that biggerInt extends int. No compiler or runtime check that it actually does extend.
function isBiggerInt(value: number): value is type biggerInt extends int { return /^\d{6,}$/.test(value.toString()); }
// -----------------------------------------------------------^^^^ type extension assertion
function printNum(value: number) { console.log(value); }
function printInt(value: int) { console.log(value); }
function printBiggerInt(value: biggerInt) {console.log(value); }

const num = 123;
printNum(num); // ok
printInt(num); // error
printBiggerInt(num); // error
if (isInt(num)) {
    printNum(num); // ok
    printInt(num); // ok
    printBiggerInt(num); // error
}
if (isBiggerInt(num)) {
    printNum(num); // ok
    printInt(num); // ok
    printBiggerInt(num); // ok
}

Which might be useful even though it's not sound. But as I said at the start, do we require that it is extendable, or can we implement it as suggested by @disjukr? (If the latter, I suggest we implement it in @disjukr's non-extendable way.)

maxlk commented 6 years ago

Kind of offtopic, reply to @DanielRosenwasser first comment: For comma-separated list you have to use ^ and $ anchors (it's relevant in the most cases when you want to validate some string). And anchors help to avoid repetition, for your example the regexp will be /^((dog|cat|fish)(,|$))+$/

streamich commented 6 years ago

Allow string types to be regular expressions /#[0-9]{6}/ and allow nesting types into regular expressions ${TColor}:

type TColor = 'red' | 'blue' | /#[0-9]{6}/;
type TBorderValue = /[0-9]+px (solid|dashed) ${TColor}/

Result:

let border1: TBorderValue = '1px solid red'; // OK
let border2: TBorderValue = '1px solid yellow'; // TSError: .....

Use case: there is a library dedicated to writing "type-safe" CSS styles in TypeScript typestyle. The above proposed functionality would help greatly, because the library has to expose methods to be used at runtime, the proposed string regex types would instead be able type check code at compile time and give developers great intellisense.

skbergam commented 6 years ago

@DanielRosenwasser @alexanderbird @Igmat: IMO this proposal would be game-changing for TypeScript and web development. What is currently stopping it from being implemented?

I agree that extendability and type emission should not get in the way of the rest of the feature. If there's not a clear path on those aspects, implement them later when there is.

nolazybits commented 6 years ago

I arrived here as I am looking to have a UUID type and not a string, hence having a regex defining the string would be awesome in this case + a way to check validity of the type (Email.test example) would be also helpful.

Igmat commented 6 years ago

@skbergam I'm trying to implement it by myself once more. But TS project is really huge and I also have a work, so there are nearly no progress (I've managed only to create tests for this new feature). If somebody has more experience with extending TS any help would be greatly appreciated...

RyanCavanaugh commented 6 years ago

Interesting that this effectively creates a nominal type, since we'd be unable to establish any subtype/assignability relationships between any two non-identical regexps

simonbuchan commented 6 years ago

@RyanCavanaugh Earlier @maiermic commented

Edit: It looks like this problem is solvable in polynomial time (see The Inclusion Problem for Regular Expressions).

But that might not be good enough? One certainly hopes there is not to many regexp relationships, but you never know.

Regarding type checks, if we don't like duplicating regexps, and typeof a const isn't good enough (e.g. .d.ts files), how does TS feel about valueof e, which emits the literal value of e iff e is a literal, otherwise an error (and emits something like undefined)?

apm963 commented 6 years ago

@maxlk Also off-topic but I took your regex and improved it to not match trailing commas on otherwise valid input: /^((dog|cat|fish)(,(?=\b)|$))+$/ with test https://regex101.com/r/AuyP3g/1. This uses a positive lookahead for a word character after the comma, forcing the prior to revalidate in a DRY way.

gtamas commented 6 years ago

Hi! What's the status of this? Will you add this feature in the near future? Can't find anything about this in the roadmap.

zspitz commented 6 years ago

@lgmat How about limiting the syntax to single-line arrow functions, using only definitions available in lib.d.ts?

AndrewEastwood commented 6 years ago

Are those awesome improvements available? Maybe in alpha release at least?

jpike88 commented 6 years ago

regex validated types are great for writing tests, validating hardcoded inputs would be great.

digeomel commented 5 years ago

+1. Our use case is very common, we need to have a string date format like 'dd/mm/YYYY'.

zpdDG4gta8XKpMCd commented 5 years ago

although as proposed it would be an extremely cool feature, it lacks the potential:

better way would be outsource parsing and emitting to a plugin, like proposed in #21861, this way all of the above is not a problem at the price of steeper learning curve, but hey! the regexp checking can be implemented atop of that so that the original proposal still stands, coming up by more advanced machinery

so as i said, a more general way would be custom syntax providers for whatever literals: #21861

examples:

const uri: via URIParserAndEmitter = http://google.com; 
console.log(uri); // --> { protocol: 'http', host: 'google.com', path: undefined, query: undefined, hash: undefined }

const a: via PositiveNumberParser = 10; // --> 10
const b: via PositiveNumberParser = -10; // --> error

const date: via DateParser = 1/1/2019; // --> new Date(2019, 1, 1)
Akxe commented 5 years ago

@lgmat How about limiting the syntax to single-line arrow functions, using only definitions available in lib.d.ts?

@zspitz that would make a lot of people unhappy, as they would see, that it is possible, but forbined for them, basically for their safety.

Are those awesome improvements available? Maybe in alpha release at least?

As far as I know this still need a proposal. @gtamas, @AndrewEastwood

Also I think #11152 would be affecting this.

Akxe commented 5 years ago

@Igmat Your proposal limits the validation to only be useful with string types. What are your thoughts on @rylphs proposal? This would allow a more generic validation for all primitive types:

type ColorLevel = (n:number) => n>0 && n<= 255
type RGB = {red:ColorLevel, green:ColorLevel, blue:ColorLevel};
let redColor:RGB = {red:255, green:0, blue:0}   //OK
let wrongColor:RGB = {red:255, green:900, blue:0} //wrong

I suspect however that extending this mechanism beyond primitives to non-primitive types would be too much.

The main problem I see with this is security concerns, imagine some malicious code, that would use buffers to grab the user's memory while checking for type. We would have to implement a lot of sandboxing around this. I would rather see 2 different solutions, one for strings and one for numbers.

RegExp is immune to that to some extends as the only way you can use this maliciously is to make some backtracking expression. That being said, some users might do it unintentionally, therefore, there should be some kind of protection. I would think the best way to do it would be a timer.

One point, the issue that @DanielRosenwasser raised -- about varying regex engine implementations -- would be magnified: depending on the Javascript engine under which the Typescript compiler is running, the validation function might work differently.

That is true, this is bad, but we can solve it by specifying what "modern" part of regExp we need for our codebase. It would default to normal (is it ES3?) regexp, that works in every node. And option to enable new flags and lookbehind assertions.

const unicodeMatcher = /\u{1d306}/u;
let value: typeof unicodeMatcher;
function(input: string) {
  value = input;  // Invalid
  if (input.match(unicodeMatcher)) {
    value = input;  // OK
  }
}

If a user has disabled flag with advanced flags.

let value: typeof unicodeMatcher = '𝌆';  // Warning, string literal isn't checked, because `variable` is of type `/\u{1d306}/u`.

TypeScript would not evaluate advanced RegExp, if not told to. But I would suggest that is should give warning, explaining what is happening and how to enable advanced RegExp checking.

If user has enabled flag with advanced flags and his node supports it.

let value: typeof unicodeMatcher = '𝌆';  // OK

If a user has enabled flag with advanced flags and his node supports it.

let value: typeof unicodeMatcher = '𝌆';  
// Error, NodeJS does not support advanced RegExp, upgrade NodeJS to version X.Y.Z, or disable advanced RegExp checking.

I think this is a reasonable way to go about it. Teams of programmers have usually the same version of NodeJS or are easily able to upgrade since all their codebase is working for someone with a newer version. Solo programmers can adapt easily on the fly,

m93a commented 5 years ago

What's the current status of this issue? It's really a pitty to see that TypeScript has such huge potential and dozens of awesome proposals, but they don't get much attention from the developers…

AFAIK the original proposal was good apart from the Emit overview which is a no-go and not really needed, so it shouldn't be blocking the proposal.

The issue it's trying to address could be solved by the introduction of regex literals (which shouldn't be hard, as they're effectively equivalent to string and number literals) and a type operator patternof(similar to typeof and keyof), which would take a regex literal type and return a validated string type. This is how it could be used:

type letterExpression = /[a-zA-Z]/;
let exp: letterExpression;
exp = /[a-zA-Z]/; // works
exp = /[A-Za-z]/; // error, the expressions do not match

type letter = patternof letterExpression;
type letter = patternof /[a-zA-Z]/; // this is equivalent

let a: letter;
a = 'f'; // works
a = '0'; // error
const email = /some-long-email-regex/;
type email = patternof typeof email;

declare let str: string;
if (str.match(email)) {
  str // typeof str === email
} else {
  str // typeof str === string
}
Igmat commented 5 years ago

@m93a I didn't thought about such solution with additional type operator, when was working on initial proposal.

I like this approach of removing emit impact caused by types, even though this seems to be more verbose.

And this lead me to idea how to extended this proposal in order to both skip adding new keyword (as you suggest) - IMO we already have pretty big amount of them and do not have emit impact from type system (as in my proposal).

It'll take 4 steps:

  1. add regexp-validated string literal type:
    type Email = /some-long-email-regex/;
  2. Let's change RegExp interface in core lib to generic:
    interface RegExp<T extends string = string> {
        test(stringToTest: string): stringToTest is T;
    }
  3. Change type infer for regex literals in actual code:
    const Email = /some-long-email-regex/; // infers to `RegExp</some-long-email-regex/>`
  4. Add type helper using conditional types feature, like InstanceType:
    type ValidatedStringType<T extends RegExp> = T extends RegExp<infer V>
        ? V
        : string;

Usage example:

const Email = /some-long-email-regex/;
type Email = ValidatedStringType<typeof Email>;

const email: Email = `em@example.com`; // correct
const email2: Email = `emexample.com`; // compile time error

let userInput: string;
if (Email.test(userInput)) {
    // `userInput` here IS of type `Email`
} else {
    // and here it is just `string`
}
m93a commented 5 years ago

@Igmat Cool. Your proposal feels more natural for TypeScript and requires less changes to the compiler, that's probably a good thing. The only advantage of my proposal was that regex literals would feel the same as string and number literals, this could be confusing for some:

let a: 'foo' = 'foo'; // works
let b: 42 = 42; // works
let c: /x/ = /x/; // error

But I think that the simplicity of your proposal outweighs the one disadventage.

Edit: I don't really like the length of ValidatedStringType<R>. If we decided to call validated strings patterns, we could use PatternOf<R> after all. I'm not saying that your type takes longer time to type, most people would just type the first three letters and hit tab. It just has larger code spagetification impact.

Akxe commented 5 years ago

@Igmat Your solution is excellent from the development point, but as readability goes, it would be much better to have to the possibility as @m93a proposed. I think it could be internally represented in the much same way, but it should be presented to the user as simple as possible.

m93a commented 5 years ago

@Akxe I don't think that the devs would fancy adding another keyword that only has one very specific use case.

@RyanCavanaugh Could you please tell us your opinion on this? (Specifically the original proposal and the four last comments (excluding this one).) Thank you! :+1:

amir-arad commented 5 years ago

how about having generic argument for string, that defaults to .*?

let a: 'foo' = 'foo'; // works
let b: 42 = 42; // works
let c: /x/ = /x/; // works
let d: string<x.> = 'xa'; // works

string literal 'foo' can be considered a sugar for string<foo>

Igmat commented 5 years ago

I don't really like the length of ValidatedStringType<R>. If we decided to call validated strings patterns, we could use PatternOf<R> after all.

@m93a, IMO, in this case it would be better to call them PatternType<R> to be consistent with already existing InstanceType and ReturnType helpers.

@amir-arad, interesting. How will interface RegExp look like in this case?

@RyanCavanaugh I could rewrite original proposal with newly found way if it'll help. Should I?

m93a commented 5 years ago

@amir-arad Your proposed syntax is in conflict with the rest of TypeScript. Now you can only pass types as a generic argument, not an arbitrary expression. Your proposed syntax would be extremely confusing.

Think of generic types like they're functions that take a type and return a type. The two following pieces of code are very close in both meaning and syntax:

function foo(str: string) {
  return str === 'bar' ? true : false
}

type foo<T extends string> = T extends 'bar' ? true : false;

Your new syntax is like proposing that regex in JavaScript should be written let all = String(.*) which would be an ugly abuse of the function call syntax. Therefore I don't think your proposal makes much sense.