wpilibsuite / sphinxext-opengraph

Sphinx extension to generate unique OpenGraph metadata
https://sphinxext-opengraph.readthedocs.io
Other
74 stars 27 forks source link

Use first image #18

Closed tcmetzger closed 3 years ago

tcmetzger commented 4 years ago

This pull request gives users the option to use the first image of each page for the page's og:image tag.

If the new config-option ogp_first_image is set to True, sphinxext-opengraph will automatically scan each document's nodes for images. sphinxext-opengraph will then use the first image on the page for the og:image tag (if an image is present).

This is a feature I have been using in my own custom og-extension. When you share a news article on Facebook, for example, Facebook will usually display an image from the article itself, not a generic image from the news organization. So from a user's perspective, if I have an image on a page I am sharing, I'd expect og:image to be an image from that specific page, not just a generic image that would be the same for all the pages on this website.

ogp_first_image works well with the other image options sphinxext-opengraph already has. I have automated the behavior as much as possible and have tested various combinations locally. Some examples:

This pull request also includes a new test and a short explanation in the readme.

Daltz333 commented 4 years ago

What happens if the specified image url is a relative one, IE images/myimage.png

tcmetzger commented 4 years ago

Very good point, @Daltz333! I'll try to add some code to detect and handle that. Checking for the netloc component with urllib's urlparse() might be worth a try. I'll ping you once I have a solution!

TheTripleV commented 4 years ago

What if the image is an svg? Is that still a nodes.image or is it a nodes.figure? Opengraph has support for specifying the type of an image, but do opengraph viewers support displaying svgs?

Daltz333 commented 4 years ago

@TheTripleV nodes.figure is related to the actual figure directive .. figure:: which is similar to image, but has extra customizability

tcmetzger commented 4 years ago

Regarding @Daltz333's point: the urljoin() method luckily already takes care of that for us (see "If url is an absolute URL" in the docs).

Regarding @TheTripleV's point: Seems like SVGs don't work (at least not on Facebook): https://stackoverflow.com/questions/21636503/use-svg-as-ogimage. I just ran a test with an SVG file, and that SVG does show up in nodes.image (and would thus get used as og:image). So I guess it would make sense to filter the detected images to make sure SVGs don't get used as og:image.

tcmetzger commented 4 years ago

This most recent commit addresses @TheTripleV's point: opengraph.py now includes a static IMAGE_FILE_EXTENSIONS list. Sphinx loops over the detected images and will use the first detected image that matches IMAGE_FILE_EXTENSIONS. This way, SVG images will never be used for og:image, even if the first image on a page is an SVG file. If a page includes no other images besides the SVG file, Sphinx will treat it as if no image was detected.

TheTripleV commented 4 years ago

The OGMetadataCreatorVisitor stops traversing after the desired description length is met. This means that mcv.images will be empty on websites such as https://docs.wpilib.org/en/latest/docs/getting-started/getting-started-frc-control-system/control-system-software.html.

Instead of mcv.images, to grab a list of all the images in a doctree, you could use list(doctree.traverse(nodes.image))

tcmetzger commented 4 years ago

With this most recent commit, I have restructured the code to reflect the ideas from this discussion (including doctree.traverse(nodes.image)). I have also made the variables' names and comments a little clearer.

Daltz333 commented 4 years ago

We are currently investigating refactoring the URL schema to take READTHEDOCS environment variables into affect. Particularly READTHEDOCS_LANGUAGE and READTHEDOCS_VERSION. This will allow better readthedocs integration. So this PR might be held for a small bit.