w3ctag / design-principles

A small-but-growing set of design principles collected by the TAG while reviewing specifications
https://w3ctag.github.io/design-principles
162 stars 44 forks source link

Classes (WebIDL interfaces) should only be used when you have both data and behavior #11

Open domenic opened 10 years ago

domenic commented 10 years ago

I see a lot of classes that have only data, or have only behavior. Both of these cases should be represented by plain JS objects.


Edit by TAG: Current TAG thinking here

marcoscaceres commented 10 years ago

I guess you will explain this a bit, but can you give me a quick tl;dr example/why behavior-only interfaces should be represented as JS objects? Not really getting what you mean.

domenic commented 10 years ago

Sure. If there's no state to mutate, then it's just a bunch of functions. (Note: functions, not methods.) And we represent a collection of named functions in JS by e.g. { foo: function () { ... }, bar: function () { ... } }, not by declaring a class with those functions as methods, and then creating an instance of that class, and then calling the instance's methods.

marcoscaceres commented 10 years ago

Thanks for the clarification, @domenic. It would indeed be nice to be able to express this somehow in WebIDL.

There is something that does worry me tho: It's usually pretty clear when you are returning "data" objects, but I think it gets trickier to design objects that are purely behavioral - as it's easy to add accessor properties on them. Having them as classes makes it easier to allow others to extend them (either directly through the prototype or through extends proper).

domenic commented 10 years ago

I see what you're saying. However, in my experience, the behavior-only objects in web specs are usually pretty clear: they're often singletons, and often have only one or two functions. As such, a shared prototype is not useful anyway in those cases, reinforcing the idea that a class is not the right tool for the job.

dbaron commented 6 years ago

Should WebIDL namespaces be used for the behavior-only case?

travisleithead commented 6 years ago

@dbaron that makes sense to me.

LeaVerou commented 2 years ago

@domenic

I see a lot of classes that have only data, or have only behavior.

Could you please point us to these examples? It would be very helpful in writing this up.

torgo commented 2 years ago

So we discussed in today's call and we all agree we should write something. Probably this isn't a webidl specific thing. The principle is about the API shape. @LeaVerou will work on this with @cynthia and @atanassov .

marcoscaceres commented 2 years ago

@LeaVerou, I put together a gist with all the IDL in wpt/interfaces/*.idl... should hopefully help with finding examples there: https://gist.github.com/marcoscaceres/1ef8f3eb3b37b75ecce8aea61c336e98

Looking quickly.... MediaError seems like a bad one (should have been a JSError maybe).

There are situations where using an interface (for data) is unavoidable, like with AudioTrack, VideoTrack, GeolocationCoordinates, because they are used types for attributes, and WebIDL doesn't allow Dictionary on attributes.

Anyway, hopefully the above helps with finding examples!

RByers commented 1 year ago

Any update on this? I like the principle of avoiding unnecessary interfaces, and appreciate the TAG working on some guidelines to help improve consistency across the platform here.

Note that this just came up in a debate around exposing the Reporting API interfaces.

LeaVerou commented 1 year ago

Discussed this again today and decided to a) bump its priority b) insert it as the first section in chapter 6.

@RByers if you had any ideas on good examples (both positive and negative), that would speed up writing this. Thanks!

RByers commented 1 year ago

Great, glad to hear it @LeaVerou! Thank you!

I think @domenic's concern about reporting is a good example. Briefly, the Reporting spec and other specs derived from it define a bunch of interfaces for objects which just contain data (JavaScript API to what is a JSON blob in the HTTP API). In chromium we had implemented these with [NoInterfaceObject]. But that's deprecated and when we went to fix that we found there was at least some compat risk where exposing window.Report would break some websites. In addition to the argument that's already been made about such a use of interfaces not being idiomatic javascript, I think it's better for evolution of the platform if we're only adding things to the globals when there's real value to doing so, as every one is another potential paper-cut where web developers may perceive the browser as treading on their turf. The web's global objects are a limited resource, we should be intentional about every little piece we use. I say this as a blink API owner whose knee-jerk reaction to exposing all these new global properties was "yep, makes sense, please do that". I think I might have also been the one to first put a interface ReportBody into the reporting spec (without thinking about it much), so blame goes to me all around here. 😞

I don't have other examples off the top of my head, but I suspect @domenic does.

LeaVerou commented 1 year ago

While taking a stab at writing this today, some thoughts/questions that emerged:

Looking quickly.... MediaError seems like a bad one (should have been a JSError maybe).

Not sure if MediaError is a good example here, since JSError is also a class. Unless we refine the guidance as to also avoid pointless subclassing.

There are situations where using an interface (for data) is unavoidable, like with AudioTrack, VideoTrack, GeolocationCoordinates, because they are used types for attributes, and WebIDL doesn't allow Dictionary on attributes.

Is there an API design reason here or this just a limitation of WebIDL?

domenic commented 1 year ago
  • A class is also a "guarantee" that an object has a particular schema.

I wouldn't say that's really the case.

If you're directly calling a web API, e.g. const obj = someWebAPI(), then you already have a guarantee that obj has a particular schema: whatever the web API is specced to return. That's true whether it's a dictionary or an interface.

If you're not directly calling a web API, e.g. if you're receiving obj from somewhere else, then you don't have any guarantees on the schema just from instanceof tests. Intermediate code could have modified the object in any number of ways.

  • What if an object only has data now but could have behavior later? Then it makes sense to start it off as class?

If you strongly believe that's the case, then maybe. But it's easy to upgrade to a class later.

  • Similarly, a class allows objects that look like they only have data, but their data is actually behavior (accessors) that can perform validation etc. Isn't it common to have objects that start off as data-only, and later evolve to include accessors of some sort?

I haven't experienced this evolution on any web platform APIs. Usually because we know from the beginning whether an API's properties are going to be behavior-having, or not. If changing a property is going to have some behavior (e.g. changing how something is displayed to the user), then it indeed the object needs to be an instance of a class. And then it can do validation, etc. in its setters.

Is there an API design reason here or this just a limitation of WebIDL?

The API design reason is twofold, one somewhat technical and one more JS-developer-relevant.

The JS-developer-relevant reason is that dictionaries are plain JS objects, which can be modified arbitrarily. But arbitrary modification of a shared object is not generally what you want: that affects what other people see. Let's use as an example a simplified version of window.navigator, which only has two properties: navigator.userAgent and navigator.onLine. That's just data, right? So maybe it could be a dictionary. But then, what does navigator.userAgent = "Firefox" do? What does navigator.onLine = false do? Presumably it just confuses other people who want to consult that info. This should not be allowed. You need a class, which can leave those properties as readonly. (Or hypothetically define setter behavior for them.)

Contrast this with https://wicg.github.io/ua-client-hints/#interface, in particular the UADataValues dictionary. Since it is returned from the method navigator.userAgentData.getHighEntropyValues(), not from a property, it's pretty obvious that doing (await navigator.userAgentData.getHighEntryValues()).mobile = false is not going to try to modify anything. You just got back your own copy from each call, and that copy won't impact what other callers get back.

The technical reason is that dictionaries are designed as "by value" types, which when converted between IDL and JS, create a new JS object each time. This is exactly what you want, for the return-value-of-a-method case like we discussed above. It isn't what you want for the property case, where it would cause instance.propertyThatIsADictionary !== instance.propertyThatIsADictionary (because you generate a new JS object from the IDL dictionary value each time). In theory we could change this and have some stable notion of a dictionary-as-single-JS-object just for properties, but that would be pretty confusing and inconsistent, and because of the above JS developer-facing reasons, is not a good idea anyway since it encourages shared mutation.


So overall, I think maybe an important thing to focus on, which hasn't been touched on much in the above, is the issue of identity. If there's a single object that represents a single concept, then a class instance makes sense. In those cases, I wouldn't call the class's member's "data", but instead I would call them "state". Any modifications to that state are seen by all users of it, and can be centrally guarded by setters or by making their properties readonly. This can hold even if the class has no methods, and only has properties.

Whereas, if you're just getting "data" for a given consumer, dictionaries are simpler and more idiomatic JavaScript.

Mostly this divides pretty cleanly into property-type vs. method-return-value-type, but there are other cases, like the reporting API callbacks, where you need the more general framework. (The reporting API callbacks are consumers that just want to see some data.)

LeaVerou commented 1 year ago

@domenic

Good point wrt identity.

The JS-developer-relevant reason is that dictionaries are plain JS objects, which can be modified arbitrarily.

You made the point earlier that class instances can also be modified arbitrarily. In your example, you justify that navigator should be a class so that it can have readonly properties. However, plain objects can have readonly properties just the same. (to clarify, I’m not arguing that navigator should have been a plain object, just trying to get to the bottom of what our guidance should be!)

If you're not directly calling a web API, e.g. if you're receiving obj from somewhere else, then you don't have any guarantees on the schema just from instanceof tests. Intermediate code could have modified the object in any number of ways.

I’m not very convinced by this argument: That's basically an argument against all instanceof usage, yet developers use it regularly. Yes, it's not 100% robust, but most code isn't.


While looking for examples, I stumbled on this other question. From your older comment:

If there's no state to mutate, then it's just a bunch of functions.

If I’m reading this right, does that mean classes should be used not just when you have both state and functions, but when these functions mutate said state? By that tighter definition, Response shouldn't have been a class: all its instance methods produce new objects, they do not mutate the Response object. However, intuitively it makes perfect sense that Response is a class, which indicates that perhaps the guidance should require that behavior actually mutates state.


Is there any example of a Web Platform API that returns plain objects with data or plain objects with only functions? I've been struggling to find some.

annevk commented 1 year ago

@LeaVerou for Response there's a Body mixin that has methods that mutate state.

encodeInto() in https://encoding.spec.whatwg.org/#interface-textencoder returns a dictionary (but also mutates its second argument). (It is arguably itself unusual as instance method, that's mainly for symmetry with TextDecoder.)

LeaVerou commented 1 year ago

@LeaVerou for Response there's a Body mixin that has methods that mutate state.

All I see in the Body mixin are functions that create new objects, am I missing something? But the general question stands regardless of Response, which was an example.

encodeInto() in encoding.spec.whatwg.org/#interface-textencoder returns a dictionary (but also mutates its second argument). (It is arguably itself unusual as instance method, that's mainly for symmetry with TextDecoder.)

Thanks! For positive examples I was hoping for more common, widely known APIs that most Design Principles readers would be familiar with already, but in the absence of those, any example would do.

annevk commented 1 year ago

@LeaVerou those methods end up consuming the response's body.

marcoscaceres commented 1 year ago

@LeaVerou, a couple of examples:

There are some problems though, like there was a recent case where an APIs wanted to use the dictionary PaymentCurrencyAmount as an attribute value, which wouldn't be possible.

So, it's important to also thing about when/where a data object might be used (and if it will only ever be a return type). Otherwise, you can end up with (somewhat) clunky, but not tooooo baaaaaad, .getWhateverData()-like methods.

annevk commented 1 year ago

I think these days we would have used a dictionary for GeolocationPosition and GeolocationCoordinates though.

marcoscaceres commented 1 year ago

Oh, yes, I should have included GeolocationPosition too. Accidental omission.

RByers commented 1 year ago

Another example where I may have screwed up: InputDeviceCapabilities. It really is just pure data, but we made it an interface and put an attribute on UIEvent for reading it. If it weren't for that attribute, I would propose trying to change that one into a dictionary too.

LeaVerou commented 1 year ago

Discussing with in a breakout with @cynthia, we realized we both think the guidance should be more nuanced:

First, we do agree that classes with only behavior don't make sense, that's just a collection of functions.

We are not so sure about objects that are only data. First, many of these objects may eventually evolve to contain behavior. Once you have state, it's easy to come up with methods to mutate it or make computations on it, and whether you have these from the get go or not should not dictate such a fundamental architectural decision.

We are definitely in agreement that "input-only" objects (e.g. options dictionaries) should always be plain objects (WebIDL dictionaries). For example, WebRTC APIs are terrible offenders here. Also, APIs accepting objects that are only or mainly data, should also accept plain objects, for better developer ergonomics (e.g. the headers parameter in fetch() is a good example).

However, class instances can be useful when objects are returned from an API and thus, can be passed around, as an object being an instance of a class is an implicit contract that it follows a certain schema (yes, that can be manipulated in JS, but everything can be). It's far easier for a developer to do e.g. obj instanceof GeolocationCoordinate than obj.latitude && obj.longtitude && obj.altitude && .... Furthermore, classes do no harm in this case, as they can be used as plain objects anyway (and should include suitable serialization rules to serialize as the equivalent plain object).

Another argument against that came up against using classes for data objects is namespace pollution, but this can be mitigated by these classes being hung on to the relevant APIs, rather than the global scope. E.g. GeolocationCoordinate could have been Geolocation.Coordinate.

In terms of defining or editing principles, some that could come out of this could be:

(Note: We use the term "Data-centric class" to describe a class whose instances primarily contain data, and potentially some auxiliary methods, or no methods at all)

ptomato commented 1 year ago

Re. "Do not define classes that only have functions (e.g. never another Math object).", there is some discussion from a TC39 meeting that may be relevant:

In this discussion, objects such as Math, JSON, Temporal were termed "namespace objects" although it's still pending to write a definition in ECMA-262 of what that term exactly does and doesn't encompass. To my eye they don't seem like classes, at least for my mental model of what a class is: for example, you can't create an instance of Math.

ljharb commented 1 year ago

Additionally, although https://github.com/tc39/proposal-first-class-protocols is still stage 1 and too early to base design principles around, I'd personally find obj implements GeoLocationCoordinate as a protocol far more elegant and reliable than obj instanceof GeoLocationCoordinate with a class that otherwise shouldn't really be a class.

bathos commented 1 year ago

However, class instances can be useful when objects are returned from an API and thus, can be passed around, as an object being an instance of a class is an implicit contract that it follows a certain schema (yes, that can be manipulated in JS, but everything can be).

What can’t be is private host-controlled state associated with the object, so an operation that takes (SomeInterface or SomeInterfaceInitDictionary) is guaranteed to be faster when the input is a real SomeInterface instance — there are no JS-observable conversion steps to perform at all for that Web IDL union type with that input. In ES as opposed to Web IDL, this can end up a bit like the difference between using new Intl.SomethingFormat(...) and something.toLocaleString(...); the former means interpreting the formatting options only once even if you then format many times, while the latter API must interpret the formatting options every time you format.

Unfortunately, it doesn’t seem like (many?) Web IDL APIs take advantage of that? For example new Request({ headers: reusableBonaFideHeadersInstance }) still does all the JS observable slow path stuff using the mutable public API rather than just consuming the private state which is already known to be valid. I would think that it’s desirable to use the (SomeInterface or SomeInterfaceInit) union pattern in those cases to take advantage of those invariants, and that the design principles might even want to say as much, but perhaps there’s a reason this has been avoided to date?

LeaVerou commented 1 year ago

Additionally, although tc39/proposal-first-class-protocols is still stage 1 and too early to base design principles around, I'd personally find obj implements GeoLocationCoordinate as a protocol far more elegant and reliable than obj instanceof GeoLocationCoordinate with a class that otherwise shouldn't really be a class.

Agreed, but how do you define "shouldn't really be a class"? I don't think whether it has methods or not is a good heuristic. Perhaps whether it could conceivably have methods in the future, but that's very handwavy.

LeaVerou commented 1 year ago

In today’s breakout we decided to draft some text for these:

  • Accept plain objects (WebIDL dictionaries) whenever possible (i.e. whether there is a corresponding data-centric class or not)
  • Data-centric classes should include a constructor signature that accepts a single plain object
  • Define JSON serialization rules for data-centric classes that return the equivalent plain object

There doesn’t yet seems to be enough consensus for this:

  • Prefer defining data-centric classes as static properties on the APIs that they are related with (e.g. Geolocation.Coordinate instead of a global GeolocationCoodinate) (to be discussed more)

Whereas this doesn't seem to be much of a problem in existing APIs (as mentioned by @ptomato namespace objects are fine):

  • Do not define classes that only have functions (e.g. never another Math object).
ljharb commented 1 year ago

Math isn't a class, so I'm a bit confused why that's a problem. We also have JSON, Reflect, and Atomics that are all conceptually identical to Math.

LeaVerou commented 1 year ago

Math isn't a class, so I'm a bit confused why that's a problem. We also have JSON, Reflect, and Atomics that are all conceptually identical to Math.

Yeah, that was a poor example, namespace objects are obviously fine. Basically, we don't want a constructible class, which holds no state but only behavior. We couldn't find any examples of this, but some people thought it may be good to document it anyway, but it's pretty low priority.

bathos commented 1 year ago

DOMImplementation might be a good example of a historical interface that shouldn’t have existed. Each instance is 1:1 with a Document instance but it has no state of its own and represents nothing.

domenic commented 1 year ago

DOMParser is another.

LeaVerou commented 1 year ago

Oh, excellent examples, thank you both!!