Create a specification for getNodeText used by queries

NicholasBoll commented 4 years ago

Describe the feature you'd like:

I've seen churn on what "text" means for a given element. The pendulum has swung from innerText to textContent based off unexpected use-cases. I think we should create a specification for what getNodeText returns for the queries that rely on it.

Effected queries:

*ByTitle
*ByText
*ByDisplayValue

I think *ByLabelText used to use it, but it is no longer.

Some examples that aren't working as expected.

Nested inline elements

This example is probably more common for Cypress, Puppeteer, etc. For example, Cypress implicitly waits for actionability. If a button is disabled and the test calls cy.findByText('Hello').click(), Cypress will not wait for the disabled to be removed from the button element. There is no workaround other than not using findByText at all. Adding a selector should define what element we want to return, but getNodeText won't be able to find anything.

<button><span>Hello</span></button>
<script>
  getByText('Hello') // returns span
  getByText('Hello', { selector: 'button' }) // error - not found
  // expected: `button` element
</script>

<th>
  Country
  <!-- Country Filter for the table header -->
  <select>
    <option>Colombia</option>
  </select>
</th>
<script>
  getByText('Country') // returns th
  // expected: `th` element
</script>

Inconsistencies with *ByLabelText

<label for="1234">My Label<abbr>*</abbr></label>
<input id="1234">
<script>
  getByText('My Label'); // returns <label>
  getByLabelText('My Label'); // Error: Unable to find a label with the text of: My Label
  getByText('My Label*'); // Error: Unable to find an element with the text: My Label*. This could be because the text is broken up by multiple elements. In ...
  getByLabelText('My Label*'); // returns <input>
</script>

There are probably other edge cases we can build a specification around.

This would be a breaking change for existing implementations that rely on the current functionality. For example:

<button><span>Hello</span></button>
<script>
  getByText('Hello') // error, 2 elements found. Before, only span was returned
  getAllByText('Hello') // [button, span]. Before, only [span] was returned
  getByText('Hello', { selector: 'span' }) // span
  getByText('Hello', { selector: 'button' }) // button
</script>

Suggested implementation:

Write specifications for what should be returned given a few different DOM structures.

Describe alternatives you've considered:

I've looked through some of the history around the implementation of the getNodeText function. It seems edge cases have driven the implementation, but there isn't much conditional coding to handle multiple edge cases at once.

Teachability, Documentation, Adoption, Migration Strategy:

Hopefully this feature doesn't require much documentation as it will mostly do what one would expect. Edge cases could be enumerated in the documentation, however. Something like:

getByText tries to grab elements by most likely text of an element. For example, a button element with nested span elements will return the whole text of the button, spans included. getByText also tries to filter out unintended text. For example, a table header that contains a select element will remove the text of the select element itself as that is most likely the intent of selecting the table's header text.

eps1lon commented 4 years ago

I think we want to encourage getByRole('button', { name: 'Hello' }) for these specific examples. I think this should be preferred for most use cases but I'd be interested in scenarios where you want *ByText on an element without a role.

NicholasBoll commented 4 years ago

Interesting, getByRole('button', {name: 'Hello'}) didn't work for me. It tells me multiple elements were found with the role of 'button' (if there are multiple buttons on the page). This happened on v4 and v5 of this library. Is it supposed to work?

eps1lon commented 4 years ago

This happened on v4 and v5 of this library. Is it supposed to work?

Possibly if there are indeed multiple elements with that role and name. But this was only added in 6.15 so you might be on a version that doesn't support the name filter.

The text that is used for name even has a spec (full docs).

NicholasBoll commented 4 years ago

Oops, I was looking at @testing-library/jest-dom. This library is a transitive dependency of @testing-library/react. The version I was using was 6.12. It looks like 6.15 was released 5 days ago.

The name spec is hard to read, but it looks like it could cover the button-specific use case.

eps1lon commented 4 years ago

The name spec is hard to read, but it looks like it could cover the button-specific use case.

You don't actually need to read it but rather know what an accessible name is. Should get you most of the way. For button it is essentially the textContent.

NicholasBoll commented 4 years ago

I also like to separate selectors from assertions for specifications. It helps understand failures:

<nav>
  <!-- elements here -->
  <button aria-label="Page 1">1</button>
  <!-- more elements here -->
</nav>

expect(getByText('1')).toHaveAttribute('aria-label', 'Page 1')
// example failures
// * No element:
//    Error: Unable to find an element with the text: 1. This could be because the text is broken up by multiple elements. In this ca...
// * Missing label:
//     Expected the element to have attribute: aria-label="Page 1" Received: null
// * Wrong label:
//     Expected the element to have attribute: aria-label="Page 1" Received: aria-label="Foobar"

// vs

getByLabelText('Page 1')
// example failures - all of them could have a HUGE dump of HTML
// * No element:
//     Error: Unable to find a label with the text of: Page 1 <body> <div> <nav aria-label="Pagination" class="css-r...
// * Missing label:
//     Error: Unable to find a label with the text of: Page 1 <body> <div> <nav aria-label="Pagination" class="css-r...
// * Wrong label:
//     Error: Unable to find a label with the text of: Page 1 <body> <div> <nav aria-label="Pagination" class="css-r...

If I only have 1 element, maybe it doesn't matter. If I have a few elements, it gets harder to understand which element is missing what. For understanding the spec? I think it matters. If I change my above example:

// oops, spec says aria-label, aria-labelledby, title, aria-describe, and aria-describedby take precedence.
expect(getByRole('button', { name: '1' })).toHaveAttribute('aria-label', 'Page 1') // error

// same problem as above, only worse. How does `Page 1` get calculated?
// Does this mean the button should visibly show `Page 1` or show `Page 1` for assistive technology?
getByRole('button', { name: 'Page 1' })

If getByRole supports computeAccessibleName from dom-accessible-api, why not have jest-dom support it with a matcher?

expect(getByText('1')).toHaveName('Page 1') // This will work event if we change `aria-label` to `aria-labelledby`

NicholasBoll commented 4 years ago

Note, after we've established a generic assertion works according to a specification, now we can move on to the specific selectors and assert something else interesting.

it('should have an accessible name for page button 1', () => {
  // awesome! I don't have to worry about _how_ that element's accessible name was calculated!
  // aria-label? aria-labelledby?
  expect(getByText('1')).toHaveName('Page 1')
})

it('should change the page', () => {
  fireEvent('click', getByRole('button', { name: 'Page 1' })
  // hmm... name changed. Maybe we still want to getBy something easily verifiable
  expect(getByText('1')).toHaveName('Selected, Page 1')
})

eps1lon commented 4 years ago

I also like to separate selectors from assertions for specifications.

Sure. But this looks like you want to assert on the accessible name. I think getting it by test-id (or role) and then matching against the accessible name is closest to what you actually want.

render(<button data-testid="first-page" aria-label="Page 1">1</button>);
const button = screen.getByTestId("first-page");
expect(computeAccessibleName(button)).toBe('Page 1');

or maybe better

render(<button aria-label="Page 1">1</button>);
const button = screen.getByRole("button", "Page 1");
expect(button).toHaveText('1');

computeAccessibleName can be imported from dom-accessibility-api which is used internally.

Though this example looks a bit l ike speech-overkill. I don't think you want to repeat "page" on every link.

NicholasBoll commented 4 years ago

Your first example is closer to what I want. I want a to start with the easiest, least abstracted implementation and then assert the more complicated, higher abstracted implementation.

Your first example has a much clearer starting point. You're grabbing an element by an attribute you manually added. It is simple enough to match those in the test if something goes wrong. The assertion could be improved with a matcher.

A matcher is an assertion with additional context. The failure here would look like:

  Error: Expected 'Foobar' to equal 'Page 1'

This doesn't give me much to go off of. It's like getting a bug report stating "It doesn't work". I'd rather tests and implementation be written like specifications that read like acceptance tests. When it fails, it tells you the what the specification is (the English part of a test) and the failure tells you exactly how it fails.

expect(button).toHave[Accessible]Name('Page 1')

Failures can now look like:

Error: Expected <button ...> to have an accessible name of 'Page 1', but was calculated to be 'Foobar'. See https://www.w3.org/TR/accname-1.1/ for more details on how accessible names are calculated
// bonus points if you can tell me how the name was calculated! From text? aria-label? title? etc.

The second example is definitely not what I want. The complicated abstraction is up-front. If screen.getByRole fails to find an element with an accessible name of 'Page 1', what does that mean? The best context I get from those types of "failed to find something" is a dump of all the HTML. What if there is a lot of HTML? Accessible names can be calculated in a complicated fashion indirectly through IDREFs (references through an id attribute. e.g. <label for="1234"> + <input id="1234">)

In the Pagination example of buttons with numbers on them, it is easier to understand if a button with "1" in it isn't found. That is easy enough to verify by the HTML or by rendering to a screen. Once I have the element, matchers have enough context for useful failure messages.

eps1lon commented 4 years ago

expect(button).toHave[Accessible]Name('Page 1')

That is what I do personally but this repository is not concerned with matchers. As you said: You want to decouple matchers from getters.

NicholasBoll commented 4 years ago

Agreed. You wanted to know scenarios where *ByText queries would be used without a role. I think *ByRole('button', { text: 'Hello' }) would be helpful in identifying a role and text. This should still be easy enough to manually verify if a query isn't found.

I did some further testing:

<label for="1234">My Label<abbr>*</abbr></label>
<input id="1234">
<script>
  getByLabelText('My Label*') // returns <input> That `*` is required
  getByText('My Label') // returns <label>. Cannot include the `*`
</script>

Both those queries are by the text, but the "text" part is different.

eps1lon commented 4 years ago

I don't follow what the issue is now. The button examples you gave in the opening post can switch to *ByRole + name which has an official spec. *ByDisplayValue is a different issue.

NicholasBoll commented 4 years ago

A case where I have a button text that is different than the accessible name:

<button aria-label="Page 1"><span>1</span></button>
<script>
  getByText('1') // doesn't work
  getByRole('button', { name: '1' }) // doesn't work. `name` is actually 'Page 1'
  getByRole('button', { text: '1' }) // API doesn't exist

  // works, but fails unhelpfully if we somehow remove or change the accessible name.
  getByRole('button', { name: 'Page 1'})
</script>

So I can't cover this case with existing APIs. I must resort to using low-level DOM API to accomplish this. It seems like getByText('1') should work. getByLabelText seems to handle multiple child elements just fine when evaluating text.

eps1lon commented 4 years ago

A case where I have a button text that is different than the accessible name

And why can't you query by the accessible name? Why is it so important to query the full text content?

It seems like you're tunnel visioning on

getByText()
matchAccessibleName()

when you can already do

getByAccessibleName()
matchText()

We cannot use the same method for getting the text in ByDisplayValue, ByText and ByLabelText. That would defeat the purpose of having different methods in the first place.

eps1lon commented 4 years ago

// works, but fails unhelpfully if we somehow remove or change the accessible name.

What do you mean "unhelpfully"? The error message should include the accessible names of each element in the a11y tree.

NicholasBoll commented 4 years ago

On projects I work on, we have design specifications and accessibility specifications. Product Managers, Designers, UX Engineers, QA Engineers, Developers, and Accessibility Specialists look at these specifications. All these stakeholders understand was is visually represented first and accessible names second (accept some of the Accessibility Specialists). Querying by accessible name, then asserting text is backwards from this mentality.

NicholasBoll commented 4 years ago

What do you mean "unhelpfully"? The error message should include the accessible names of each element in the a11y tree.

In the following example:

<button aria-label="Page 1"><span>1</span></button>

Let's say somehow the "Page 1" aria label is removed and there are many of these Page buttons that exist in the DOM. If the error message includes all the accessible names in the tree, it will only include "1" (We can assume there will be "2", "3" and possibly other names of other elements). Is it always obvious how to fix this?

If instead the specification is something like "The button will have a text content of the page number and it will have an aria-label attribute of Page {x}" and we write our test and implementation this way, now it is much more obvious what went wrong and how to fix it.

NicholasBoll commented 4 years ago

We cannot use the same method for getting the text in ByDisplayValue, ByText and ByLabelText. That would defeat the purpose of having different methods in the first place.

I'm not sure what you mean here. I laid out a clear example for how ByText and ByLabelText understand the concept of "text" differently. They return different things. I'm not suggesting the return the same result (ByText returns the element that contains given text while ByLabelText returns the element labelled by given text)

eps1lon commented 4 years ago

Querying by accessible name, then asserting text is backwards from this mentality.

Yeah this is where we disagree. The idea of this library is that we first consider a11y and then what's visual:

This library encourages your applications to be more accessible

This is also a practical concern: If we would want to consider visuals first then we need to take styles into account. This requires a lot more work and is not viable in JSDOM in the first place.

Let's say somehow the "Page 1" aria label is removed

We want to fail here. The accessible text and therefore the element semantics changed. Obviously it is quite harmless in this case but we can only compare strings by their characters not their semantics.

Is it always obvious how to fix this?

The goal can't be that a fix is obvious. That's impossible. But since we do include the accessible name it's not a far stretch that you actually realize that your buttons are now only "1", "2" and "3" (since they are listed in the error) instead of "page 1", "page 2", "page 3". I think you're missing the bigger picture here.

Also I'd recommend narrowing down the base element. If you actually query the whole page for "1" then chances are you get a lot of false positives anyway. A user wouldn't scan the full page for a "1" anyway but rather within(getByRole('navigation', { name: 'pages' })).getByRole('button', { name: 'Page 1' }).

I'm not suggesting the return the same result (ByText returns the element that contains given text while ByLabelText returns the element labelled by given text)

So you want us to write specifications for all these methods? I don't think this adds much value and requires an incredible amount of work. The documentation already explains this loosely and has examples. No need for a spec here.

ByDisplayValue, ByText and ByLabelText

NicholasBoll commented 4 years ago

I'm not suggesting using styling to detect rasterized pixels on the screen. It kind of sounds like you want to remove ByText instead of fixing some of the unexpected edge cases.

I'm not sure we actually disagree about the importance of accessibility. I think we're optimizing differently.

How about this. Your examples sound more like tests and the way I'm thinking are more like specifications. A test has to pass, a specification requires context, semantics, and meaning.

For example, this is a test:

// Pagination
test('A Page is accessible', () => {
  const { getByRole } = render(<Pagination {...defaultProps} />)
  getByRole('button', { name: 'Page 1' }) // do we even need an assertion? It's kind of already mixed in with the query
})

Series of specifications:

describe('Pagination with 10 pages', () => {
  it('should have a role of "button" for a Page', () => {
    const {getByText} = render(<Pagination {...defaultProps} />);

    expect(getByText('1')).toHaveRole('button');
  });

  it('should have an accessible name of "Selected, Page 1" when first page is selected', () => {
    const {getByText} = render(<Pagination {...defaultProps} currentPage={1} />);

    expect(getByText('1')).toHaveAccessibleName('Selected, Page 1');
  });

  it('should have an accessible name of "Page 1" when first page is not selected', () => {
    const {getByText} = render(<Pagination {...defaultProps} currentPage={2} />);

    expect(getByText('1')).toHaveAccessibleName('Page 1');
  });
});

Both the test and the specifications will fail if 'Page 1' isn't found correctly, but the specifications have a much easier starting point. It is more verbose, but the context is offloaded to the tests. When/if a specification fails, there is more context.

The test covers a lot in very little code, but much of the context is assumed. If/when a failure happens, we probably have to start debugging and build up context again to understand failures.

Again, I'm thinking in a much bigger scope. Design, UX, Accessibility, etc all have specifications for UI components. We are codifying them in a way that we can extract the text inside describe and it blocks and put them into a readable format for stakeholders of our design system. Specifications make it obvious what behaviors are specified. If we get a bug report, is it because the specification wasn't complete? Was it because the specification is wrong? Our code becomes the source of truth for specifications instead of lost to archived JIRA tickets, slack messages, Zoom meetings, Confluence docs, and conversations.

My goal is to be accessible in a way that is communicable to other people.

NicholasBoll commented 4 years ago

Let's say somehow the "Page 1" aria label is removed

We want to fail here. The accessible text and therefore the element semantics changed. Obviously it is quite harmless in this case but we can only compare strings by their characters not their semantics.

We agree that we want this to fail, but we disagree on where and how it should fail. My example above shows a way of thinking that breaks down where and how. A failing query has only the context of the whole DOM where a failing matcher has context scoped to the element it was given.

Is it always obvious how to fix this?

The goal can't be that a fix is obvious. That's impossible. But since we do include the accessible name it's not a far stretch that you actually realize that your buttons are now only "1", "2" and "3" (since they are listed in the error) instead of "page 1", "page 2", "page 3". I think you're missing the bigger picture here.

Why not? Let me know if the specification example above is not clear enough. I'm not suggesting AI-driven, self-fixing code. I'm suggesting the ability to add more context into assertions by requiring less context in queries. Let me be clear. I use queries that have more accessibility built-in. I just don't do it if I haven't already built in the understanding that a component is accessible first. Downstream users of our components can make these accessible assumptions because we've guaranteed them through our tests. If our specifications say our page buttons have an accessible name of "Page 1", they are free to use getByRole('button', { name: 'Page 1' }.

Also I'd recommend narrowing down the base element. If you actually query the whole page for "1" then chances are you get a lot of false positives anyway. A user wouldn't scan the full page for a "1" anyway but rather within(getByRole('navigation', { name: 'pages' })).getByRole('button', { name: 'Page 1' }).

100% agree. I'd just like to have specifications that guarantee so that others can do this.

I'm not suggesting the return the same result (ByText returns the element that contains given text while ByLabelText returns the element labelled by given text)

So you want us to write specifications for all these methods? I don't think this adds much value and requires an incredible amount of work. The documentation already explains this loosely and has examples. No need for a spec here.

I've seen a pendulum swing in implementation of getNodeValue with a single edge case tacked on. There's breaking changes and PRs around the implementation of this code. ByLabelText used to use it, now it doesn't ByLabelText and ByText understand the concept of "test" differently. The value is less gotcha's for people using these methods.

kentcdodds commented 4 years ago

Sorry I've been ridiculously busy recently and missed all of this discussion.

Can either of you please briefly sum up the discussion so we can all be on the same page (I find re-iterating what's being discussed and the trade-offs of the options helps everyone focus on the task at hand, so this is not just me being lazy).

In particular, I want to know whether either of you think there's an opportunity to make DOM Testing Library better by making a breaking change.

kentcdodds commented 4 years ago

I read through this and I definitely want to address the getNodeText issues. It's just reeeeeally hard to do it right. For example:

// <body><div><span>Hello</span></div></body>
getAllByText('Hello') // returns span, div, body?

// <body><div><span>Hello</span> World</div></body>
getAllByText('Hello World') // returns div, body?

This suggest that a query would match the element that contains the text and all parents above it 😬 It's actually more nuanced than that, but this particular (non-edge) case has made this method really difficult to work on.

Considering we don't really have a working idea for this, I'm going to go ahead and push this major version we have, and we can iterate on ideas for this for a future version.

airtonix commented 4 years ago

what about

// <body><div><span>Hello</span></div></body>
getAllByText('Hello').parent('div') // div

// <body><div><span>Hello</span> World</div></body>
getAllByText('Hello World') // span
getAllByText('Hello World').parent('body') // body

Maybe it's just my peasant brain. but that seems like a nice "opt in" way of dealing with it.

agos commented 1 year ago

I bumped into this issue today because the documentation states that byText uses textContent (it does not).

testing-library / dom-testing-library