ken107 / read-aloud

An awesome browser extension that reads aloud webpage content with one click
https://readaloud.app
MIT License
1.39k stars 237 forks source link

Allow override of default ignore tags #203

Open voikya opened 3 years ago

voikya commented 3 years ago

I've recently received a few bug reports related to the read-aloud not reading certain content on our site. Part of the problem is the use of aside tags in various ways as callout boxes, which contain important content that shouldn't be ignored but that are not part of the main content flow. (This usage appears to be compatible with the HTML spec for aside as far as I can tell). As much of this content is user-generated, we do not have direct control over how certain tags are used.

I believe the issue is because aside tags are flagged to be ignored under basically all circumstances: https://github.com/ken107/read-aloud/blob/14e1defded2e65edd84bf47429d23d6ad4989d16/js/content/html-doc.js

This is sensible if the aside does not contain any meaningful additional content (such as repeating quotations from the main body, as it is often used for), but is more problematic if the aside contains unique content.

I'd like to suggest two possible ways of dealing with this:

  1. Adding a heuristic to try to determine what kind of content an aside contains. If it contains a complex structure rather than just a single text node, it may be worth reading aloud.
  2. Adding a way for websites to flag content as readable. An example might be to check for the presence of a data-readable attribute, so that <aside data-readable="true">...</aside> would not be ignored and could be walked while searching for content, while a regular <aside>...</aside> would continue to be ignored. This would allow for a generic way for any side to manually flag content as readable for the read-aloud plugin if they're interested in maintaining explicit support.
ken107 commented 3 years ago

Thank you for the analysis. I think we can do option 2. Currently we have the .no-read-aloud class which can exclude any elements, we probably can add something for the opposite case. Though implementation could be slightly tricky

eriese commented 8 months ago

I work on a wiki and we're getting complaints about infoboxes being skipped. I'm glad to help implement a custom allow list if there's interest in it, but if not I still think it could make sense to change aside to aside:not(.portable-infobox) on the deny list unless it's going to introduce performance issues.

I know it sounds weirdly specific but that class is universal to all wikimedia based wikis, so it's a very common use case for an aside that's likely to be relevant

awesomerobot commented 7 months ago

Hello! A user of Discourse (https://github.com/discourse/discourse) has reported this issue with aside tags to us as well. I would recommend not adding blanket tag omissions like this and more closely following W3C accessibility standards where possible.

The idea of adding a class to specifically include elements seems like a nice idea, but it's very unlikely to solve this issue broadly. It would be hard to expect many websites and applications to add classes that only support a single browser extension.

Generally across the web, when a developer intends to hide content from a screen reader or similar application, the aria-hidden tag is used or the element is made invisible with CSS (https://developer.mozilla.org/en-US/docs/Web/Accessibility/ARIA/Attributes/aria-hidden). Content that is not hidden with one of these methods is expected to be useful to the user, and should not be removed.