w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.49k stars 662 forks source link

[cssom-view] elementFromPoint, elementsFromPoint, and caretPositionFromPoint should not return an element inside a shadow tree #556

Open rniwa opened 8 years ago

rniwa commented 8 years ago

elementFromPoint and elementsFromPoint should not return an element inside a shadow tree. Instead, it should look for the highest shadow host of the element and return that instead so that it doesn't leak an element in shadow trees.

See the shadow DOM specification.

More precisely, once these methods are added on DocumentOrShadowRoot interface, then we need to retarget the element we found against the context object.

rniwa commented 8 years ago

@annevk @hayatoito @zcorpan

rniwa commented 8 years ago

Okay, there's another issue with the current definition either in CSS OM or shadow DOM specification. When the result of hit testing results in a Text node which is a direct child of ShadowRoot, what should we return? It looks like Chrome currently returns null but that seems broken. Given the definition of retargeting, it should be returning the shadow host instead.

But none of this is clear because either specification really says anything about in which tree the node is found (e.g. composed tree versus flat tree; probably the latter) and when & where the retargeting happens.

rniwa commented 8 years ago

Another edge case to consider is when the hit testing results in a Text node which is assigned to a slot. I think we should return the slot when elementFromPoint is called from within the shadow tree and the shadow host when it's called outside the shadow tree.

rniwa commented 8 years ago

I'm going to implement the proposed behavior in https://bugs.webkit.org/show_bug.cgi?id=162882 because whatever Chrome/Blink currently implements is awfully broken.

rniwa commented 7 years ago

caretPositionFromPoint is even weirder. It’s not clear to me what should happen to an offset when the node is retargeted.

hayatoito commented 7 years ago

Hmm. It is not good that we would not have interoperable implementations.

@yosinch FYI. He should know better than me about how Blink implemented these APIs.

zcorpan commented 6 years ago

So given https://github.com/w3c/webcomponents/issues/735#issuecomment-366140953 it appears that chromium and gecko are moving towards what WebKit implemented. Is that right @rakina @smaug---- ?

It seems the spec needs to cover at least these things, in addition to the above:

But none of this is clear because either specification really says anything about in which tree the node is found (e.g. composed tree versus flat tree; probably the latter) and when & where the retargeting happens.

Where is flat tree defined?

hayatoito commented 6 years ago

Where is flat tree defined?

flat tree is defined here: https://drafts.csswg.org/css-scoping/#flattening

smaug---- commented 6 years ago

So given w3c/webcomponents#735 (comment) it appears that chromium and gecko are moving towards what WebKit implemented.

I'm not quite sure, at least based on test results. Like Gecko returns null for text under shadowroot, since that is what is most logical (host is after all above ShadowRoot). Current Nightly (with shadowdom pref) gives same test results as blink in https://wpt.fyi/shadow-dom/DocumentOrShadowRoot-prototype-elementFromPoint.html

(random note, it is weird to have wpt for methods which behavior is mostly undefined.)

rakina commented 6 years ago

Blink's behavior is moving towards WebKit's implementation. A patch landed about a month ago but it's not reflected on wpt.fyi yet I guess (it seems wpt.fyi is still using Chrome 63?)

smaug---- commented 6 years ago

And what is that behavior? Need to get the spec changed before making random changes to implementations.

rakina commented 6 years ago

For elementFromPoint, we retarget the hit test result againts context object (document/shadow root the elementFromPoint method is called on).

For elementsFromPoint, we retarget the hit test result againts context object, one-by-one and only include elements once in the list (no duplicates).

More details on cases with text nodes and slots etc, it is as the test at https://cs.chromium.org/chromium/src/third_party/WebKit/LayoutTests/external/wpt/shadow-dom/DocumentOrShadowRoot-prototype-elementFromPoint.html?rcl=9f75a2842405ed0f378d8a4e60d6de2568704880&l=43 says, and I guess this WPT should be removed as long as the spec is not clear yet?

hayatoito commented 6 years ago

@rakina

My guess is that Blink's HitTest never returns an element which doesn't have LayoutBox. If the result of HitTest is a text node, we have to adjust it to its parent element in LayoutTree. If we honer that "We never returns an element which doesn't have LayoutBox", we shouldn't return a slot, even if a text node is assigned to the slot, as a result of adjustment, because slot's default style is "display: contents". We must return slot's parent in LayoutTree in that case.

From this perspective, the test looks correct to me.

tabatkins commented 6 years ago

If the result of HitTest is a text node, we have to adjust it to its parent element in LayoutTree.

What's its parent element? The nearest ancestor that's an element in the flat tree? Or something else? (Sorry, I'm just not sure which spec concept the LayoutTree is approximating.)

hayatoito commented 6 years ago

The nearest ancestor that's an element in the flat tree? Or something else?

Let me clarify my intention:

I don't know any good terminology to express this. :(

annevk commented 6 years ago

@tabatkins LayoutTree is almost certainly the box tree.

It's not immediately clear to me however why hit testing wouldn't immediately go from the box in the box tree to the associated node in the flat tree/node tree, and then from there you find the nearest ancestor element.

It seems a little weird to have a special element hit testing primitive that finds a box corresponding to an element. At least I hadn't heard of that concept before.

tabatkins commented 6 years ago

Properly, hit-testing applies to fragments; they're what results from layout. (Boxes don't have positions or sizes, they're the result of applying box-construction only.) (explanation)

The relevant question is exactly how the traversal is done. Here's one method:

  1. Let |frag| be the fragment at |point|.
  2. If |frag| was generated by a text run (rather than a box), walk its ancestors in the fragment tree until you find a fragment generated by a box, and set |frag| to that.
  3. Let |box| be the box that generated |frag|.
  4. If |box| is an anonymous box or was generated by a pseudo-element, walk its ancestors in the box tree until you find a box generated by an element, and set |box| to that.
  5. Let |element| be the element that generated |box|.
  6. If |element| isn't allowed to be returned in this context due to shadow-hiding, walk its ancestors in the flat tree until you find one that's allowed to be returned and that generated a box, and set |element| to that.
  7. Return |element|.

@hayatoito Is this what we do? If not, what differs?

phistuck commented 6 years ago

@tabatkins - for non-shadow-allowing contexts, that may return a display: contents element, right? Should it?

tabatkins commented 6 years ago

No, it can't, because display:contents elements can't create boxes, so they can't create fragments, so they won't get selected by step 3.

phistuck commented 6 years ago

@tabatkins - but they can be/contain shadow DOM, that does contain boxes and fragments and so on...

<ShadowDOM style="display: contents">
  <ShadowDOM-div>Hi</ShadowDOM-div>
</ShadowDOM>

And document.elementFromPoint(...) for the H in "Hi".

tabatkins commented 6 years ago

Ohhh, indeed, my algo will report that. Hm. I guess then step 6 needs to have an additional "and created a box" condition, so you'd instead skip up to the parent of that element.

rniwa commented 6 years ago

So here's a very basic example for hit testing:

div ---- ShadowRoot
 + "hi"    + "some text"
           + slot

Let's say we invoked elementFromPoint on ShadowRoot with points right above "hi". In that case, I would expect the slot to be the result. Consider mousemove event on "hi". The first event to receive the event is the slot, not the shadow host.

hayatoito commented 6 years ago

@tabatkins Sorry for the late reply. I am now looking topic candidates of TPAC 2018.

The proposal looks good to me, with the update. Thank you for working on that.

@rniwa Yes, it returns the slot if the slot is "display: block", however, if the slot is "display:contents", we have to skip it, and have to walk up ancestors to find an element which has a box. It is div, I think (ancestors walking: slot -> "some text" -> div (we found an element which has a box here))

tabatkins commented 6 years ago

@hayatoito No problem!

I'm wondering now, tho, if step 6 is ideal. Rather than walking the element-tree to find a non-shadow-hidden element, I wonder if we should continue to walk the box tree until we find a non-shadow-hidden element.

I'm not sure if there's actually a difference in behavior, but shadow stuff is confusing. ^_^

hayatoito commented 6 years ago

Yup, I think we can simplify the steps without changing the semantics. I think we can merge the step4 and step6, if we prefer shorter steps.

rniwa commented 6 years ago

It would be great if either @hayatoito or @tabatkins could provide some examples of how the proposed spec change would behave.

tabatkins commented 6 years ago

@hayatoito Oh yeah, duh, you're right, they can just be merged with that change. So:

  1. Let |frag| be the top-most fragment at |point| that responds to pointer events. (Or all of them for elementsFromPoint(), etc.)
  2. If |frag| was generated by a text run (rather than a box), walk its ancestors in the fragment tree until you find a fragment generated by a box, and set |frag| to that.
  3. Let |box| be the box that generated |frag|.
  4. If |box| is an anonymous box, or was generated by a pseudo-element, or was generated by an element that's shadow-hidden from the call's context, walk its ancestors in the box tree until you find a box that doesn't match any of those conditions, and set |box| to that.
  5. Return the element that generated |box|.

@rniwa Taking your example, and assuming the slot was still its default display:contents, then the element returned would be the div. You click on the "hi" fragment, find that it was generated by a text run, then walk the box tree upwards until you find the div; this can then be returned.

If the slot was display:block or something, then if you called shadowRoot.elementFromPoint(), you'd return the slot; if you called document.elementFromPoint(), the slot would fail the condition in step 4, so we'd keep walking upwards until we found the div and return that.

rniwa commented 6 years ago

I don't think that matches the author expectation. If a hit testing was done on a text assigned to a slot, then I want to know that the content inside a slot is picked, not on a shadow host. Whether an element generates a CSS box or not doesn't seem like an important distinction from the developer ergonomics standpoint of view.

With the proposed behavior, the author who wants to detect whether a given point is inside a slot or not, would have to check whether a node / element returned by these functions are assigned to some slot. It's even worse. If it was a descendent of another element with display: contents outside the shadow tree, we'd have to walk up the tree to find if the ancestor, which is a direct child of the shadow root is assigned to a slot or not. That seems like a lot of extra complexity for developers.

hayatoito commented 6 years ago

I see, however, your concern is also applied to the following case, right?

<div id=a>
   <div id=b style="display: contents">
     foo
   </div>
   <div id=c style="display: contents">
     bar
   </div>
</div>

In this case, we can't tell whether the point is on "foo" text node or "bar" text node. elementFromPoint always returns <div id=a> in either case, I think. So your concern can be converted into more general one, I'm afraid.

I think what we want here is nodeFromPoint, instead of tweaking elementFromPoint, if we want to resolve this general concern.

tabatkins commented 6 years ago

Right.

So the deal is, we're excluding display:contents elements (of which slots usually are) for a reason - there's no way to click on such an element, because it doesn't exist on the page. You can click on its contents, sure, but that's something different.

Here's an example of why this makes sense: if you call elementsFromPoint(), you'll often get the full ancestor chain of the clicked element, because usually descendants are contained within the geometry of ancestors. But if you use abspos or something to move an element outside the ancestor's geometry, then when you click on the child the ancestor does not show up, because it's not underneath the mouse click. If the parent of the abspos is display:contents, should it be returned or not? It doesn't have any dimensions; if we return it in the list, and people then check its bounds, they won't contain the point that was clicked.

(For example, see http://software.hixie.ch/utilities/js/live-dom-viewer/saved/6290.)

If we don't return such an element in this situation, we shouldn't return it from elementFromPoint(), just because it happens to be the parent node of some clicked text; elementFromPoint() should always return the first element of the array returned by elementsFromPoint().

So yeah, insofar as capturing a useful notion of what text was clicked is important (and I agree it is), what you want is nodeFromPoint(), so we can return the text node, rather than trying to do some tricky nonsense with elementFromPoint() to return an element that is literally not at the specified point.

rniwa commented 6 years ago

In this case, we can't tell whether the point is on "foo" text node or "bar" text node.

UA can certainly tell which text node given all major UAs support text selection via mouse drag.

Here's what I'm suggesting. Instead of look for the first element in CSS box tree, find the node and the return its parent in the composed tree if the node is not an element (obviously after considering which node should be visible to which tree).

tabatkins commented 6 years ago

UA can certainly tell which text node given all major UAs support text selection via mouse drag.

By "we", Hayato was referring to the user of the API; they wouldn't be able to tell which text was at the provided point (given the suggested behavior), since they'd just get the container element.

Instead of look for the first element in CSS box tree, find the node and the return its parent in the composed tree if the node is not an element (obviously after considering which node should be visible to which tree).

That doesn't seem to address my points:

Defining the algorithm as suggested, where it finds the first element generating a box generating a fragment containing the specified point, gives us good, consistent answers to the first two bullet points. Adding a nodeFromPoint(), as Hayato suggested, solves the third bullet point in all of its variations - you can always tell which text node is at the specified point, regardless of how its surrounding/ancestor elements are styled.

rniwa commented 6 years ago

Rough consensus at TPAC F2F: Add an option to include 'display: contents', implement what @hayatoito and @tabatkins proposed, and add nodeFromPoint().

tabatkins commented 6 years ago

So the "option to include display: contents" would, I guess, run the algorithm normally, then also run a display:contents seeker, and return whichever is closer to the starting fragment in flat-tree order?

The display:contents seeker would be something like:

  1. Let |frag| be the top-most fragment at |point| that responds to pointer events.
  2. Walk |frag|’s inclusive ancestors in the fragment tree until you find a fragment generated by a box-tree node that was generated by a DOM text node or DOM element node. Let |DOM node| be that DOM text node or DOM element node.
  3. If |DOM node| is not an element, set |DOM node| to the first of its ancestors that is an element.
  4. Return |DOM node|.

Then nodeFromPoint() would be the same algorithm, just without step 4.

smaug---- commented 6 years ago

We still need a proper algorithm here, especially for the cases where the first non-display:contents element is inside some other shadow tree. In that case elementFromPoint needs to find the first ancestor which is accessible to the ShadowRoot on which the method is called.

tabatkins commented 6 years ago

Ugh, I rewrote the algorithm several times and ended up accidentally killing the "search for a shadow-exposed node" step. Assume that happens between steps 3 and 4.

emilio commented 5 years ago

Regarding duplicated content, it's a bit trickier than that. Right now all engines can return a node twice if you absolutely position some generated content.

For example, the following HTML:

<!doctype html>
<style>
  html, body { margin: 0; }
  div {
    width: 100px;
    height: 100px;
    background: red;
    position: relative;
  }
  span {
    display: inline-block;
    width: 100px;
    height: 100px;
    background: green;
  }
  div::after {
    content: "";
    position: absolute;
    top: 0;
    left: 0;
    width: 100%;
    height: 100%;
    display: block;
    background: rgba(0, 0, 0, .2);
  }
</style>
<div><span></span></div>
<script>
  console.log(document.elementsFromPoint(50, 50));
</script>

Will log the <div> twice in Gecko, WebKit and Blink.

SebastianZ commented 5 years ago

Regarding duplicated content, it's a bit trickier than that. Right now all engines can return a node twice if you absolutely position some generated content.

As far as I can see, CSSOM View doesn't say anything about generated content. So I guess that needs to be clarified, too.

Also, as an author I'd expect that it is somehow indicated that the first element is actually the pseudo-element and not the <div>, but let me know if that's out of scope for this issue.

Sebastian

rakina commented 5 years ago

Regarding duplicated content, it's a bit trickier than that. Right now all engines can return a node twice if you absolutely position some generated content.

Since this is not directly related to the issue topic (elementFromPoint etc behavior w/ Shadow DOM), I suggest this to be discussed in another issue.

rakina commented 5 years ago

So it looks like we agree to use the steps described by Tab here (with some changes to consider the shadow-hidden-ness) and here. Should I/anybody interested make a spec change PR with those steps then?

@tabatkins @rniwa @smaug----

rniwa commented 5 years ago

@rakina As @smaug---- mentioned in https://github.com/w3c/csswg-drafts/issues/556#issuecomment-434491377, we still need a well defined algorithm to handle display: contents. I think we should have both algorithms nailed since there appears to be a lot of edge cases & details that we keep stumbling upon.

rakina commented 5 years ago

@rniwa What else do you think is missing from https://github.com/w3c/csswg-drafts/issues/556#issuecomment-434034827?

For the shadow handling part, we can do something like the retarget algorithm, but working with flat tree instead. So while the node's root is a shadow root and node is not a flat-tree descendant of the context object, we set the node as its shadow host. (I don't know where flat-tree is defined in the specs though, cc @hayatoito )

rniwa commented 5 years ago

Right, something like the retarget algorithm over flat tree is what we need. What we're missing is the precise definition / algorithm of what that is. https://github.com/w3c/csswg-drafts/issues/556#issuecomment-434034827 is insufficient because we don't really retarget the first ancestor in the flat tree for example. It would be great if you or someone else can come up with a precise algorithm for this. Unfortunately, I'm caught up in other urgent matters at the moment so won't have a time to do it myself.

FYI, CSS Scoping Module Level 1 defines flat tree.

rniwa commented 2 years ago

These functions continue to behave differently in Blink vs WebKit. It's a major interoperability concern for shadow DOM at this point.

josepharhar commented 1 year ago

It looks like there is a WPT here: https://wpt.fyi/results/shadow-dom/DocumentOrShadowRoot-prototype-elementFromPoint.html

@rniwa does it reflect the behavior you desire? Are there any other relevant WPTs you are aware of?

rniwa commented 1 year ago

No, that test expects Chrome's behavior. This is what we want: https://github.com/WebKit/WebKit/blob/main/LayoutTests/fast/shadow-dom/DocumentOrShadowRoot-prototype-elementFromPoint.html

hober commented 1 year ago

CSS Fragmentation doesn't define "fragment tree." @tabatkins, is there an equivalent concept?

JRJurman commented 2 months ago

@rniwa @rakina - I notice that the DocumentOrShadowRoot-prototype-elementFromPoint.html has not been updated since this discussion.

Should the test be updated to reflect what is written in the WebKit repo, or is there still an agreement to reach here? I'm happy to copy over the WebKit implementation of the tests, if that reflects what we want going forward, just let me know, and I can do that work.

zac-semolina-solutions commented 2 months ago

I recently found myself needing to write an engine-independent DocumentOrShadowRoot.elementsFromPoint() method. See https://github.com/semolina-solutions/layer-aware-pointer-events/blob/211c8c1c983f641e45e374f0d56040670ccd869e/lib/index.ts#L223.

While I'm naive to the internal implementation details in each engine, I found that Gecko's elementsFromPoint() output differs in two ways at the time of writing:

  1. Gecko will return elements only within that DOM, rather from the html element.
  2. When an element with pointer-events: none styling is queried from its Shadow DOM in Gecko, it will always be included in the resulting list, despite itself not being a potential event target.

Whether or not elements in open shadow DOMs should be included in higher-level elementsFromPoint() calls is somewhat arbitrary IMO, although including them would avoid needing to repeatedly and expensively query from each shadow root to build up the full picture. It'd be advantageous to have a deep = true setting to pierce through all open shadow DOMs below.

It'd be nice however to address difference no. 2, i.e. the inconsistency of a shadow root host element being included in the ShadowRoot.elementsFromPoint() when it's styled to ignore pointer events. It's good that it's included queries of higher-level DOMs, because it indicates that "there's something in this shadow DOM that is reactive to pointer events".