tc39 / proposal-policy-map-set

A TC39 proposal for Maps and Sets with cache replacement policies like LRU and LFU.
BSD 3-Clause "New" or "Revised" License
16 stars 2 forks source link

Subclass Map/Set or not #1

Open js-choi opened 2 years ago

js-choi commented 2 years ago
  1. Should the CacheMap and CacheSet classes be subclasses of Map and Set? (I.e., inheritance.)
  2. Or should they be subclasses of Object? (I.e., duck typing.)
  3. Or should there be no separate CacheMap/CacheSet classes, with Map and Set absorbing the new caching functionality? (Also inheritance.)

See also #11 and also tc39/proposal-readonly-collections#6.

brad4d commented 2 years ago

NOTE: I'm going to refer to the new types as CacheMap and CacheSet here, even though the names aren't decided, because I don't want to be constantly saying "these new classes" or something like that.

In favor of subclassing

  1. Subclassing could make it easier to use e.g. CacheMap as a drop-in replacement for a Map.
  2. Subclassing could allow these new object types to "automatically" get new methods that are added to Map or Set

Regarding item 1, I don't think subclassing is necessary for drop-in replacement unless there's an x instanceof Map check somewhere that cannot be easily updated. How likely is that to be a problem?

Regarding item 2 I think that's actually more of a negative than a positive, since it's likely to both limit the things we can decide to add to Map later and force us to add features to CacheMap we maybe don't want there.

When we add new features to Map we will, of course, want to consider whether they should also be added to CacheMap, but I don't think we should be forced to always add them.

Against subclassing

  1. If CacheMap inherits from Map it becomes required that it implement anything that Map does, making changes to Map harder to make.
  2. It sets an example that encourages users to write their own subclasses of Map, Set, CacheMap, and CacheSet, which probably isn't a good idea.

The trend in TC39 is to move away from encouraging subclassing of built-in things. We consider features like @species to have been misfeatures that we would now like to kill.

In other words we seem to have decided that we should not explicitly design our built-in classes for subclassing. I think there are very good reasons for this. The complexity added to the spec and the engines in order to explicitly support all the corner cases that come up with subclassing isn't worth the benefits it brings. Composition is more flexible and easier to reason about than subclassing both for us as language designers and for the people using the language.

As an example, suppose we wanted to add a method to Map that creates a new Map object from some subset of the data from the original Map. If CacheMap must inherit it also we have to answer lots of questions?

js-choi commented 2 years ago

For what it’s worth, I considered mentioning at the plenary that I would like not to subclass Map and Set (and not even support Symbol.species). I moderately agree with the points above.

If none of the other co-champions (@syg, @hemanth) voice an opinion, then I plan to make the specification use independent classes.

(Having said this, deciding not to subclass Map/Set would raise the additional question of whether we should support all or a subset of Map/Set instance methods, which deserves its own issue later.)

NOTE: I'm going to refer to the new types as CacheMap and CacheSet here, even though the names aren't decided, because I don't want to be constantly saying "these new classes" or something like that.

See #8. Barring co-champion pushback, I think we will probably end up renaming the classes to CacheMap and CacheSet, but we’ll see.

js-choi commented 2 years ago

In https://github.com/tc39/proposal-policy-map-set/issues/10#issuecomment-1218613052, @ljharb raised the idea that we could add an options bag to the existing Map and Set classes. This would be functionally similar to subclassing Map and Set: there would be a strong coupling between Map/Set’s APIs and cache maps/sets’ API.

I am not super fond of this option (new Map([], { policy: 'lre', maxSize: 256 }) is clunky when there are no initial values), but it is nevertheless an option to be considered.

ljharb commented 2 years ago

Adding such an option would also allow us to add a class CacheMap extends Map builtin, which avoids the clunkiness, but also allows Map.prototype.get.call and friends to do the Right Thing on a CacheMap instance.

js-choi commented 2 years ago

I agree that, if we make CacheMap a subclass of Map, then it may be a good idea to put CacheMap’s functionality in Map itself, and so Map.prototype.get would simply work on CacheMaps.

Having said that, is there a particular reason why it would be a good idea to couple CacheMap and Map with inheritance?

I find @brad4d’s previous arguments, against coupling Map and CacheMap, compelling. In other words, we still may want to go the WeakMap route and make CacheMap completely separate from Map, so that CacheMap isn’t forced to support everything Map supports.

(In particular, we may wish to make CacheMaps not iterable, with random access only – see #3.)

ljharb commented 2 years ago

Regarding item 1, the https://npmjs.com/is-map predicate wont return true if it’s not a subclass. This isn’t at all a constraint ofc - it just means that in addition to is-map and is-weakmap, I’d have to add an is-cachemap (which I’d create anyways) and then update all places that check is-map (including, https://npmjs.com/which-collection-type) to also check is-cachemap, instead of having it Just Work.

I would expect most userland code to not use something akin to my predicates and to use instanceof Map.

imo the mistake is that WeakMap and Map aren’t part of the same inheritance tree, and I’d prefer we not worsen that mistake by repeating it.

js-choi commented 2 years ago

I’m intrigued by the idea that WeakMap not subclassing Map was a mistake. If WeakMap was a subclass of Map, then it must have supported Map’s iterator methods – but WeakMap is random-access only by design. What would myWeakMap.values() return?

(This question is also relevant to CacheMap, because we might want to make CacheMap also random-access only: #3.)

ljharb commented 2 years ago

I said "same inheritance tree" - iow, i'd expect that all Map collections were instanceof Something, whatever that was, and i'd expect a subclass to be less restrictive than a superclass, not more.

theScottyJam commented 2 years ago

Having CacheMap/CacheSet inherit from Map/Set will break the Liskov Substituion principle. In most cases, a CacheMap/CacheSet is substitutable for a normal Map or Set, which is why it's tempting to make this inheritance hierarchy happen. But, this isn't always the case. These are differences in behavior that could cause some functions expecting a normal Map instance to break if a CacheMap was provided instead.

For this reason, I would also be against a new optional argument to make a normal Map behave as an LRU cache. If I receive a Map, I want to know that it's going to behave as a simple mapping, and I don't want to have to worry about data suddenly disappearing from it due to a caching strategy it might have. To me, this isn't that different from the unfortunate decision that arrays are allowed to have holes - if I receive an array, I just want to deal with a simple, conceptual list of elements without worrying about the possibility of holes. Array holes can be useful, but if we wanted that in the language, we should have put it in a separate data structure, so we could keep arrays simpler.

Going back to inheritance - I know one argument in favor of inheritance that @ljharb previously brought up, is that functions such as Map.prototype.get would then be capable of operating on any built-in map-like structure, not just on a direct instance of Map. The reason this is valuable is for writing polymorphic code that's protected against outside users modifying the Map class on you (I'll refer to this type of code as "protected" code). I do agree that this is a valuable problem to solve, however, I don't believe it outweighs the importance of Liskov's Substitution Principle along with the other arguments @brad4d brought up. Perhaps we can look into alternative ways to solve this problem besides a shared inheritance hierarchy if it's a desired feature, I'm sure there's other ways we could provide polymorphic methods that can operate on multiple classes without requiring us to, IMO, abuse inheritance.


The above arguments focus on what would happen if we made CacheMap inherit from Map. @ljharb did bring up an alternative option of having the two classes share a common subclass instead. This solution, however, doesn't work either - it will at first, but it's not future-compatible, you couldn't add new types of Map classes without running into issues with this solution.

As a concrete example, lets say WeakMap did have this ideal hierarchy that @ljharb wishes it had, where both Map and WeakMap inherit from a common superclass that shares all non-iteration behavior (so, doesn't share .keys(), .forEach(), etc). The Map class implements these additional behaviors while WeakMap does not.

Now, as an example, lets say we also want to implement the readonly collections proposal, and try to put it into the inheritance hierarchy as is being discussed here. How could we make both ReadonlyMap and WeakMap fit into the same hierarchy? We can't.

Both of these constraints can not be satisfied at the same time without using multiple inheritance, which JavaScript doesn't natively have. Inheritance is just the wrong tool for this job. We'll need to find some other way to deal with writing "protected" code that doesn't involve adding inheritance where it really doesn't belong, and forcing us into awkward decisions in the future where we have to use multiple inheritance but can't.

(I focused on WeakMaps above, but it turns out that this CacheMap would also require a similar inheritance hierarchy to a WeakMap, as its iteration methods are either not going to exist, or will need to behave differently, requiring them to be different methods from the ones Map has. So, a CacheMap and the readonly collections are also unable to come together into a single inheritance hierarchy without requiring multiple inheritance).

js-choi commented 2 years ago

Bringing up an earlier comment (https://github.com/tc39/proposal-policy-map-set/issues/1#issuecomment-1219617441) from @ljharb:

I would expect most userland code to not use something akin to my predicates and to use instanceof Map.

…in contrast, @erights asserts in https://github.com/tc39/proposal-readonly-collections/issues/6#issuecomment-555304919 that JavaScript is duck typed – or perhaps at least typically used with duck typing – and that it “it doesn't bother anyone that there's no common superclass” between Map and WeakMap.

It is indeed my own experience as a developer that most APIs (at least the ones with which I work) do rely on duck typing. I personally prefer to leave inheritance-versus-composition as an implementation detail that my duck-typed APIs do not care about. But I am only one data point.

Perhaps it would be worth getting a temperature check from the Committee at plenary, especially since this is a manifestation of a general core-language-design question about new data structures, Symbol.species, and inheritance-versus-composition.

(Note that I also opened #11 to discuss custom cache classes, whether we should support them, and whether they should use inheritance or composition from the base cache-map class. I consider #11 to be orthogonal to this issue #1, because whether or not specific cache-map classes use inheritance or composition, the base cache-map class may itself be a subclass of Object, a subclass of Map, or Map itself.)

ljharb commented 2 years ago

While that's true, that's also because people don't typically pass around WeakMaps :-)

erights commented 2 years ago

While that's true, that's also because people don't typically pass around WeakMaps :-)

Why do you think that? I certainly have. I don't know why others wouldn't.

ljharb commented 2 years ago

@erights I think all use cases for WeakMap are exceedingly niche, whereas cases for Map are quite widespread. Certainly WeakMap users might be likely to pass one around.

erights commented 2 years ago

In any case, it was an example of a broader principle that I do believe is largely true in common practice.

ljharb commented 2 years ago

https://github.com/search?l=JavaScript&q=instanceof+Map&type=code has 21 million results; unfortunately https://github.com/search?l=JavaScript&q=%27get%27+in&type=Code is polluted by .getIn( examples so it's not clear if 'get' in is used commonly.

In general, I do not in practice see people checking the existence of a method before calling it - instead I see someone doing a quick and dirty "is it an X" check, and then just assuming it has all the relevant methods after that.

erights commented 2 years ago

My point is not about a dynamic check. It is about writing code that assumes that an argument has certain methods with certain meanings, invoking those methods on that argument, and then other code providing arguments that satisfy those expectations without any formal tie-in between the expression of the what's expected and the expression of what's provided.

My own Map vs WeakMap code is not code that checks for the existence of a property. It is code that assumes an argument implements at least the WeakMap contract, and then other code that may call with with a WeakMap argument or a Map argument, knowing that both are correct.

zloirock commented 2 years ago

I'm against subclasses in this case. Simplification of a brand check, simplification of internals, existent Map and WeakMap case, etc.

js-choi commented 2 years ago

To add onto @erights’ point, most duck-typed JS code (at least what I’ve written/seen) is code that assumes that data structures meet an implicit interface and presumptively calls that interface’s methods. I agree that duck typing that checks methodName in obj is rare, but that’s not the duck typing that I (and probably @erights) are talking about; we are talking about presumptive duck typing.

With regards to presumptive duck typing, it is good for CacheMaps to have a similar interface to Maps. Within an algorithm or application, Maps that are already being used as ephemeral caches should be easily replaceable by CacheMaps. For example, a Fibonacci function likely uses a Map as an incidental memoization cache, and it would be nice to be able to easily replace the Map with a CacheMap without changing the function’s code: cache.get and cache.set calls remaining unchanged within the function. That drop-in replacement of the Map by a CacheMap within the function is possible because of presumptive duck typing.

But, being ephemeral and incidental, CacheMap data are not likely to be persisted or transferred – similarly to WeakMaps. Instead, CacheMaps probably will be mostly used internally within algorithms and applications.

Most public APIs are unlikely to meaningfully accept both Maps and CacheMaps as interchangeable arguments.

Code that does polymorphically distinguish between Maps and CacheMaps are likely to be generic data-structure utilities like deep cloning and deep equality.

And, for deep cloning and deep equality, it would probably be inappropriate to treat CacheMaps as if they were Maps:

In other words, I don’t yet really see how CacheMaps fulfill the Liskov substitution principle for Maps – like how @theScottyJam mentioned earlier.


Doing a deeper dive into extant use cases for Map–CacheMap polymorphism, consider the first result from the GitHub search for instanceof Map, which is rfdc. The rfdc package is a popular package for deep cloning of data structures. In this case, rfdc is not much affected by the CacheMap ⊂ Map question, because, either way, CacheMaps should probably be cloned into CacheMaps, not Maps.

Next consider packages that depend on is-map. The biggest of these are which-collection and es-get-iterator; we’ll put these aside for now.

The only other directly dependent package that actually seems to use is-map is @require-transpile/core, which uses is-map to test whether an options argument is not a Map before printing a deprecation warning. This type check, arguably, would be better served if CacheMaps were not considered to be instances of Map.

The es-get-iterator package, of course, robustly gets built-in iterators. Users of this package alone would not care whether CacheMap is a subclass of Map or not. They only would care about the iterator interface that both CacheMap and Map support, so this use case doesn’t seem to be much affected by CacheMap ⊂ Map.

The which-collection package is not directly used by many packages, but it is used by the popular deep-equal, is-equal, and which-built-in-type packages, which do have many dependent packages. But, similarly to the many users of the rfdc package, the users of deep-equal, is-equal, and which-built-in-type probably would not be much affected by the relationship between Map and CacheMap. As mentioned earlier, an ordinary Map probably should never be considered equal to a CacheMap with the same entries, regardless of whether CacheMap ⊂ Map.

And, like @brad4d mentioned earlier, it would be nice if we can avoid adding Symbol.species functionality to another data structure.

erights commented 2 years ago

not the duck typing that I (and probably @erights) are talking about; we are talking about presumptive duck typing.

Yes. And I love the term "presumptive duck typing", thanks!