`window.pointer.{clientX, clientY, isDown, ...}`

(For some reason I get the sense that this will be shot down from the title alone, but please hear me out here! And please steelman this pre-pre-proposal as best you can if there are some glaring problems. It's also a bit of a "TED talk", so I'm sorry about that 😅)

I think accessibility and "friendliness" of JavaScript is really important. Having taught people JavaScript before, the lack of simple access to the current mouse state is an annoying papercut that gets in the way of creating simple and fun web apps/games to get them excited about coding.

It's not fun saying "okay we haven't learned about this yet, but just copy-paste this scary addEventListener stuff so we can get access to the mouse position in our animation loop", and for the people who don't have a tutor to guide them, this may be a point where they get fed up with the complexity while googling around trying to work it out. They just want to know where the mouse is - "simple things should be simple, complex things should be possible".

I think a lot of people would be surprised at how little JS you need to know to get something fun happening, but there are a few places where there's unneccessary friction. These create "barriers to entry" for people wanting to learn to code. The lack of a simple way to access the current mouse state is one of these barriers. It's sometimes hard to communicate this sort of thing to intermediate/advanced coders, because "is it really that hard to just create an event listener and put the mouse state into some variables whenever it updates?" Well, I can only say that it is really hard if you're new to coding - a bit of unneccessary friction like this can easily mean the difference between enjoying coding, or deciding it "isn't for me". Something akin to the "curse of knowledge" definietely applies here for people who don't understand this (it definitely applied to me before teaching coding).

There is of course p5.js, and this is what Khan Academy uses for their coding course - it's great. I think the success of p5.js in the education/accessible-coding space should point us toward possible improvements to the JavaScript language/runtime itself - much like jQuery did with querySelector.

I think features that would be really beneficial to new coders potentially don't get as much attention as they should because the interest of newbies aren't as well represented in the body of people pushing new proposals forward. These people (rightly) tend to be people with a lot of expertise with JS. I'm as excited about the pipe operator, Wasm GC, WebNN ModelLoader (and so on) as everyone else, and these are very important and exciting developments in the language/runtime, but from my experience teaching others to code, there are some really obvious barriers that would be pretty easy/trivial to fix (relative to the advanced proposals), and that would save beginners a lot of frustration/effort in their early efforts to learn and enjoy JavaScript.

The specifics of the API aren't important for the time being - only that it's globally accessible so that it's a simple property access or function call to get the mouse state at any point in the code (or perhaps the last mouse state due to single-threadedness, assuming the document is not allowed to access the OS-level mouse position [I don't know enough about this]). Touch screens don't have a clientX/Y at every instant, but these sorts of considerations aren't show-stoppers - just things to design for. For example, using the last touch position would be one option (essentially matching behavior of touch devices that also have a mouse).

This is just one of the papercuts that I think could be solved fairly easily, but I think it would be a good place to start. I'd love to get the opinion of some of the experts here on what it would take to get something like this proposed and eventually implemented. I'm guessing this is not the first time someone has suggested something like this, so I understand it might not be all that easy.

Thanks! 🙏

I can't really tell what this is proposing, despite it being a lot of text.

https://whatwg.org/faq#adding-new-features might be helpful to read. If you could provide, ideally in one or two sentences, a description of what you're trying to do that's not possible today, that is the best start.

I can't really tell what this is proposing, despite it being a lot of text.

@domenic Or maybe because it's a lot of text 😅 My bad! I should have included a succinct summary at the end. There was a lot of waffling.

Essentially something like window.pointer.clientX to get the current clientX position of the mouse. Currently something (at least vaguely) like this boilerplate is required:

window.pointerState = {clientX:0, clientY:0, leftDown:false, rightDown:false, ...};

window.addEventListener("mousemove", function(e) {
  pointerState.clientX = e.clientX;
  pointerState.clientX = e.clientX;
});

window.addEventListener("mousedown", function(e) {
  pointerState.leftDown = ...
  pointerState.rightDown = ...
});

window.addEventListener("mouseup", function(e) {
  ...
});

window.addEventListener("touchstart", function(e) {
  ...
});

...

The purpose of the above code is so that at any time window.pointerState can be accessed to see the latest pointer state.

I have no preference on the specifics of the API other than it being a simple one-line command to get the current pointer state. If there's non-trivial performance impact for listening to the pointer state then it would be fine if a one-line setup was required. E.g. let pointerState = window.watchPointerState().

I'm no API designer as you can tell, but I hope you get the general idea from this description.

I notice that p5's approach is actually more complex than you're expressing here. For exmaple, p5 has one API for mouse coordinates and another API for touch coordinates, with no general sense of just a "pointer" coordinate. There are quite a few fine details to decide about exactly how this would work, and my own hunch is that there's no one way to pin down those details that would be correct for a majority of cases.

I also notice that p5's approach ends up being glitchy about mouse buttons: since it's using a generic handler, it has no way of knowing whether a mouse event is going to be 'consumed' by the script, and therefore it can never call preventDefault on the event. On a lot of the demo pages, that means I end up accidentally selecting some text when I use the mouse button to interact. (p5 does expose a way of getting around this, but it ends up requiring you to effectively write an event handler after all, just with nonstandard conventions!)

There could be more than one pointer (multi-touch input), so it would have to be an array of values.

"okay we haven't learned about this yet, but just copy-paste this scary addEventListener stuff so we can get access to the mouse position in our animation loop"

JavaScript is almost entirely based on events, so they're going to need to know about addEventListener at some point. It might be good to not start them off on the wrong foot by calling it "scary".

Here's an example of a problem I see people running into if they tried to use this interface for a drawing app doing everything in an animation loop:

See that the left mouse button is down, so store the current position.
The mouse is still down, so draw a line from the previous position to the new position.
The mouse button is released.
See that the mouse button has been released so draw a line from the previous position to the new position.

The problem here is that the current mouse position is not necessarily the same position the mouse was in when the button was released. If you try to draw lines using this method you will end up starting and ending your lines too late. Your app will feel slow even if it's not actually having any performance issues. The right way to handle this situation is to use the existing events with addEventListener so that you get the correct mouse positions right when the mousedown and mouseup events occur.

There could be more than one pointer (multi-touch input), so it would have to be an array of values.

I agree with pshaughn that it might make sense to separate touch and mouse like p5 does. Making the touch an array of values seems like an good/easy solution for multi-touch. The original post wasn't really about the specifics of the API, but these are good considerations.

good to not start them off on the wrong foot by calling [the above boilerplate] "scary"

Pedagogically, it can be a good idea to show empathy with your student by voicing/anticipating what they're thinking. How you go about doing that depends on the student and the thing being taught (this is kinda off topic though).

The problem here is that the current mouse position is not necessarily the same position the mouse was in when the button was released.

I think you're arguing that this API could be a foot-gun? If this API were only useful for drawing apps, I might be inclined to agree. But even in that case I'd argue that it becomes clear where you're going wrong when, in the animation loop, you're writing if(mouseUp) .... It's clear at that point that you're not asking "Did the mouse just move up?" - you're asking "Is the mouse currently up". And if you want the former, that's the point that leads you to learning about listeners.

Adding complexity to prevent footguns does of course make sense sometimes, but I don't think that trade-off makes sense here. As far as I can tell, basically every popular framework for interactive media (including Unity and Godot, for example) gives you a way to simply/quickly access to the current input state, and my guess is that they didn't all happen to come to the incorrect decision in doing so.

whatwg / html

`window.pointer.{clientX, clientY, isDown, ...}` #7346