Use first image - Githubissues

tcmetzger commented 4 years ago

This pull request gives users the option to use the first image of each page for the page's og:image tag.

If the new config-option ogp_first_image is set to True, sphinxext-opengraph will automatically scan each document's nodes for images. sphinxext-opengraph will then use the first image on the page for the og:image tag (if an image is present).

This is a feature I have been using in my own custom og-extension. When you share a news article on Facebook, for example, Facebook will usually display an image from the article itself, not a generic image from the news organization. So from a user's perspective, if I have an image on a page I am sharing, I'd expect og:image to be an image from that specific page, not just a generic image that would be the same for all the pages on this website.

ogp_first_image works well with the other image options sphinxext-opengraph already has. I have automated the behavior as much as possible and have tested various combinations locally. Some examples:

Example 1: In the config, ogp_first_image is enabled, and an image is provided with og:image. Sphinx will use the first image for each sub-page that has an image - all pages that do not have images will use the image supplied by og:image.
Example 2: In the config, ogp_first_image is enabled, but no image is provided with og:image. Sphinx will use the first image for each sub-page that has an image - all pages that do not have images will not have an og:image tag.
Example 3: In the config, ogp_first_image is enabled. Some of the SRT files have images with alt texts, others don't. Sphinx will create og:image and og:image:alt tags with the information included in the corresponding SRT file. For SRT files that include an image but no alt text, Sphinx will use the existing fallback mechanism to create the og:image:alt tag either from a custom alt text supplied in the config, from site_name, or from htp.text.

This pull request also includes a new test and a short explanation in the readme.

Daltz333 commented 4 years ago

What happens if the specified image url is a relative one, IE images/myimage.png

tcmetzger commented 4 years ago

Very good point, @Daltz333! I'll try to add some code to detect and handle that. Checking for the netloc component with urllib's urlparse() might be worth a try. I'll ping you once I have a solution!

TheTripleV commented 4 years ago

What if the image is an svg? Is that still a nodes.image or is it a nodes.figure? Opengraph has support for specifying the type of an image, but do opengraph viewers support displaying svgs?

Daltz333 commented 4 years ago

@TheTripleV nodes.figure is related to the actual figure directive .. figure:: which is similar to image, but has extra customizability

tcmetzger commented 4 years ago

Regarding @Daltz333's point: the urljoin() method luckily already takes care of that for us (see "If url is an absolute URL" in the docs).

Regarding @TheTripleV's point: Seems like SVGs don't work (at least not on Facebook): https://stackoverflow.com/questions/21636503/use-svg-as-ogimage. I just ran a test with an SVG file, and that SVG does show up in nodes.image (and would thus get used as og:image). So I guess it would make sense to filter the detected images to make sure SVGs don't get used as og:image.

tcmetzger commented 4 years ago

This most recent commit addresses @TheTripleV's point: opengraph.py now includes a static IMAGE_FILE_EXTENSIONS list. Sphinx loops over the detected images and will use the first detected image that matches IMAGE_FILE_EXTENSIONS. This way, SVG images will never be used for og:image, even if the first image on a page is an SVG file. If a page includes no other images besides the SVG file, Sphinx will treat it as if no image was detected.

TheTripleV commented 4 years ago

The OGMetadataCreatorVisitor stops traversing after the desired description length is met. This means that mcv.images will be empty on websites such as https://docs.wpilib.org/en/latest/docs/getting-started/getting-started-frc-control-system/control-system-software.html.

Instead of mcv.images, to grab a list of all the images in a doctree, you could use list(doctree.traverse(nodes.image))

tcmetzger commented 4 years ago

With this most recent commit, I have restructured the code to reflect the ideas from this discussion (including doctree.traverse(nodes.image)). I have also made the variables' names and comments a little clearer.

Daltz333 commented 4 years ago

We are currently investigating refactoring the URL schema to take READTHEDOCS environment variables into affect. Particularly READTHEDOCS_LANGUAGE and READTHEDOCS_VERSION. This will allow better readthedocs integration. So this PR might be held for a small bit.

wpilibsuite / sphinxext-opengraph

Use first image #18