microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
101.17k stars 12.51k forks source link

`Set.has()` should accept any value, and could be a type predicate #60547

Open jthemphill opened 1 day ago

jthemphill commented 1 day ago

⚙ Compilation target

ES2015

⚙ Library

lib.es2015.collection.d.ts

Missing / Incorrect Definition

https://github.com/microsoft/TypeScript/blob/d85767abfd83880cea17cea70f9913e9c4496dcc/src/lib/es2015.collection.d.ts#L87-L90

I believe

Set<T>.has(value: T): boolean

should become

Set<T>.has(value: unknown): value is T

This applies for other methods as well, such as Map<K, V>.has(key: K): boolean, as well as other ES standards.

Essentially, it's not an error to pass an arbitrary value to Set.has(). It doesn't lead to a crash, exception, or error of any sort, even when we pass weird values like {}, [], or NaN, so the typechecker shouldn't stop us from doing it. We simply get a true if the value is in the Set, or a false if the value is not. And since the Set can only contain values of type T, the boolean returned by Set.has() also becomes proof that value is of type T. This makes it an ideal candidate for a type predicate.

This would immediately fix several regressions in our codebase that we saw when we upgraded TypeScript from 5.3 to 5.5. TypeScript's inferred type predicates are amazing, but they narrowed the types of our Sets and caused several of our Set.has() method calls to become errors. The sample code is a simplified example of what we've found in the wild in our codebase.

Sample Code

declare function defaultAttrs(subtype: string): string[]
declare function getAttrs(): string[]

type MyAttribute = `_${string}`
function isMyAttribute(key: string): key is MyAttribute {
  return key.startsWith('_')
}

export function myAttrs(subtype: string) {
  const defaults = defaultAttrs(subtype)
  return new Set(
    Object.keys(defaults).filter((key) => {
      return isMyAttribute(key)
    }),
  )
}

function attrInSet(subtype: string) {
  const myAttrSet = myAttrs(subtype)
  for (const attr of getAttrs()) {
    // Argument of type 'string' is not assignable to parameter of type '`_${string}`'.ts(2345)
    if (myAttrSet.has(attr)) {
      return true
    }
  }
  return false
}

Documentation Link

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Set/has

RyanCavanaugh commented 1 day ago

Basically the same as #26255; we don't agree that making arrayOfDogs.includes(cat) legal or setOfCats.has(dog) is a good idea. You can add an overload to the global type via interface augmentation (note that the reverse is not true; if we changed this to never be an error, no one could opt in to the current behavior) to get the requested behavior.

jcalz commented 1 day ago

Also see #51678 and issues linked from there

jthemphill commented 1 day ago

Thank you for the quick response!

we don't agree that making arrayOfDogs.includes(cat) legal or setOfCats.has(dog) is a good idea.

Well, Cat and Dog are disjoint types. From your discussions in the linked issues, it's clear you've seen that the real problem is when the types are not disjoint. When setOfCats.has(cat) leads to a type error, simply because setOfCats has been refined to Set<MaineCoon | Tabby | ... | ScottishFold>. To fix this error, our options are to unsafely cast setOfCats to a Set<Cat>, or to pass the cat value through a gauntlet of type refinements until it's finally worthy of the .has() method.

Once that was implemented, we absolute would change the signature to something like includes(arg: T | super T)

But isn't unknown a member of super T? This seems like allowing arg: unknown with extra steps. If a generic has no upper bound, the typechecker will always be able to resolve it to the top type.

To get the best of both worlds, I think you need some way of enforcing non-disjointness. After creating the super keyword, we did not use the super keyword in Hack's C\contains() method. Instead, we added <<__NonDisjoint>> constraints to the generics, which enforce that T1 and T2 are not disjoint:

https://github.com/facebook/hhvm/blob/9591b86b9eff17f35bb7d5eb029ca13ee6757bff/hphp/hsl/src/c/introspect.php#L41-L54

function contains<
  <<__NonDisjoint>> T1,
  <<__NonDisjoint>> T2
>(
  readonly Traversable<T1> $traversable,
  readonly T2 $value,
)[]: bool;

TypeScript may have a more elegant way of expressing this, using conditional types.

Sorry, yes. I didn't realize that type guards work both ways.

LukeAbby commented 1 day ago

While not completely ideal, a type that can mostly fit the utility of <<__NonDisjoint>> in regular TypeScript code does exist:

// `[Extract<T, U>, any] extends [U, Extract<T, U>]` simultaneously checks if `Extract<T, U>` extends `U` and
// also that `Extract<T, U>` is not `never`.
// Then `U extends T ? T : U` is a useful fallback to allow stuff like `unknown` as `Extract<T, U>` will still leave `unknown` which will rarely extend `U`.
export type OverlapsWith<T, U> = [Extract<T, U>, any] extends [U, Extract<T, U>] ? T : (
  U extends T ? T : U
);

declare global {
    interface Set<T> {
        has<const V>(value: OverlapsWith<V, T>): boolean;
    }
}

const s = new Set([[1, 2, 3], [4], [5, 6]]);

// Works whether this overload is added or not.
s.has([1, 3]);
s.has(Object.assign({}, [1, 3], { prop: 123 }))

// Always erroring as it should.
s.has(["foo", "bar"]);
s.has([1, "foo"] as const);

// Newly functioning
s.has(Math.random() ? [1, 3] : ["foo", "bar"]);

// This newly functions as its type `(string | number)[]` at runtime _could_ be all numbers.
s.has([1, "foo"]);

const x: unknown = ["foo", "bar"];
s.has(x);

// Note that this example is misleading:
const y: number[] | string[] = ["foo", "bar"];
s.has(y); // Errors but only because if you check the type of `y` is now `string[]` despite the explicit type annotation.

It's a little less implicit but it should have the same power as <<__NonDisjoint>> does ultimately. Or at least I believe it will, I'm admittedly unfamiliar.

I think it becoming a built-in type would be ideal for better errors. Plus I doubt that the TypeScript team would want to add this conditional type to the library files today, the requirement of making it generic is a bit strange at a glance and the errors can be a bit subpar in some cases.

RyanCavanaugh commented 1 day ago

Yeah, super isn't quite the right thing here. Instead of non-disjoint we call it "comparable" which basically covers the same concept. It's the type relation that covers whether x === y is allowed and would ideally be something you could use in type-land

jthemphill commented 1 day ago

Yes! We are on the same page!

For completeness, here's a playground example showing that the typechecker is more forgiving towards === in a for loop than it is to array.includes():

// Good! (Type error in the right place with a really useful error message)
function arrayIncludesLoopDisjoint<T1, T2>(array: T2[], value: T1): boolean {
  for (const x of array) {
    // This comparison appears to be unintentional because the types 'T2' and 'T1' have no overlap.(2367)
    if (x === value) {
      return true;
    }
  }
  return false;
}

// Good! (No type error)
function arrayIncludesLoopNonDisjoint<T1, T2 extends T1>(array: T2[], value: T1): boolean {
  for (const x of array) {
    if (x === value) {
      return true;
    }
  }
  return false;
}

// Makes me sad... (Type error, even though the equivalent code above typechecks. Error is kind of cryptic.)
function arrayIncludesBuiltin<T1, T2 extends T1>(array: T2[], value: T1): boolean {
  // Argument of type 'T1' is not assignable to parameter of type 'T2'.
  //   'T2' could be instantiated with an arbitrary type which could be unrelated to 'T1'.(2345)
  return array.includes(value);
}