wpilibsuite / sphinxext-opengraph

Sphinx extension to generate unique OpenGraph metadata
https://sphinxext-opengraph.readthedocs.io
Other
74 stars 27 forks source link

first_image URLs not resolving correctly on RTD #59

Closed rabernat closed 2 years ago

rabernat commented 2 years ago

Thanks for this amazing package! 👏 So helpful for the open source documentation community.

Context

I am trying to use this extension with a sphinx site (https://github.com/pangeo-forge/pangeo-forge-recipes/tree/master/docs) on ReadTheDocs. My config looks like this

ogp_image = "_static/pangeo-forge-logo-blue.png"
ogp_use_first_image = True

I am explicitly not setting ocp_site_url, as I want it to be set automatically, following this code path:

https://github.com/wpilibsuite/sphinxext-opengraph/blob/d99815bac5a42f5e9ac756cc3e74eebfd80d2505/sphinxext/opengraph/__init__.py#L70-L72

As far as I can tell, this is working correctly.

What works: default ogp_image

Our home page - https://pangeo-forge.readthedocs.io/en/latest/index.html - has no image so defaults to ogp_image. The headers look like this:

<meta property="og:title" content="Pangeo Forge Documentation" />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://pangeo-forge.readthedocs.io/en/latest/index.html" />
<meta property="og:description" content="Resources for understanding and using Pangeo Forge First Steps: New to Pangeo Forge? You are in the right spot! What is Pangeo Forge?- Read more about Pangeo Forge and how it works., Introduction T..." />
<meta property="og:image" content="https://pangeo-forge.readthedocs.io/en/latest/_static/pangeo-forge-logo-blue.png" />
<meta property="og:image:alt" content="Pangeo Forge Documentation" />

The image url - https://pangeo-forge.readthedocs.io/en/latest/_static/pangeo-forge-logo-blue.png - is correct and works.

What does not work: first_image

This page - https://pangeo-forge.readthedocs.io/en/latest/introduction_tutorial/intro_tutorial_part1.html - DOES have an image.

It's header looks like this

<meta property="og:title" content="Defining a FilePattern (Intro Tutorial Part 1)" />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://pangeo-forge.readthedocs.io/en/latest/introduction_tutorial/intro_tutorial_part1.html" />
<meta property="og:description" content="Welcome to the Pangeo Forge introduction tutorial! This tutorial is split into three parts: Defining a FilePattern, Defining a recipe and running it locally, Setting up a recipe to run in the cloud..." />
<meta property="og:image" content="https://pangeo-forge.readthedocs.io/en/_images/OISST_URL_structure.png" />
<meta property="og:image:alt" content="OISST file structure" />

Note that here the og:image URL is not correct! https://pangeo-forge.readthedocs.io/en/_images/OISST_URL_structure.png

It should be https://pangeo-forge.readthedocs.io/en/latest/_images/OISST_URL_structure.png

The autogenerated image URL is missing the /latest directory.

48 was supposed to add support for relative URLs, but something is not working right here. I would appreciate any suggestions you may have.


🙏 thanks again for your work on this very useful package!

TheTripleV commented 2 years ago

html_baseurl grabs from canonical_url on RTD, which appears to be correct.

I have a feeling the error is occurring at https://github.com/wpilibsuite/sphinxext-opengraph/blob/d99815bac5a42f5e9ac756cc3e74eebfd80d2505/sphinxext/opengraph/__init__.py#L137 where the relative path is made absolute. The level of image_url_parsed may need to be modified before it gets joined.

ItayZiv commented 2 years ago

On the aforementioned line https://github.com/wpilibsuite/sphinxext-opengraph/blob/d99815bac5a42f5e9ac756cc3e74eebfd80d2505/sphinxext/opengraph/__init__.py#L137 the path is relative to the current directory so in the given example it would be ../_images/OISST_URL_structure.png which then gets rid of the /latest when joining them together. I'll try PR a fix asap.

rabernat commented 2 years ago

I have just installed v0.6.2 on our site (https://github.com/pangeo-forge/pangeo-forge-recipes/pull/334), and unfortunately this problem is still happening...see https://pangeo-forge--334.org.readthedocs.build/en/334/pangeo_forge_recipes/tutorials/xarray_zarr/cmip6-recipe.html

ItayZiv commented 2 years ago

I think I know what the cause might be, if RTD injects a URL that doesn't end in a slash, I think urljoin will treat everything as the "path" and throw it all out when its joined with another path if this is the case, currently, adding the canonical URL manually (setting ogp_site_url for now), should solve this. Will look into fixing it.

rabernat commented 2 years ago

Thanks @ItayZiv! It's weird because it was definitely working at some point in the recent past. 🤔 In our current master branch, we are installing git+https://github.com/ItayZiv/sphinxext-opengraph@fix-first-image in our requirements.txt. This worked 26 days ago, as evidenced by this comment. But it is not working any more.

ItayZiv commented 2 years ago

Oh, I see now, it seems the fix itself is broken, we changed the impl but we didn't catch that if you set an image in the config, it would still, need to be overridden by the first image, ill PR this