GitHub and GitLab link converter should populate with issue / PR titles #1085

Open choldgraf opened 1 year ago

Currently we have link shortening for GitHub and GitLab, which re-formats the URL and makes it look nice:

However, it would be even more useful if we auto-populated issue/PR titles in this process. This is similar to what GitHub does when you link to an issue or PR on their platform:

For example:

ShareX_yAAvKjzufy

Proposal

We could write a little script that tries to infer the issue or PR title for GitHub and GitLab links. If found, it would replace the github.com/ text and use the title instead (with an icon so people know where it points).

Implementation idea

I think this wouldn't be too hard to implement, because both GitHub and GitLab will populate a page's <title> with the necessary information in the HTML <head> metadata.

So you should be able to iteratively load the HTML from a GitHub / GitLab page (to save bandwidth) and pick up the title from this metadata.

In fact, I gave this a quick shot and it turns out it is quite easy. Here's some Python that will load chunks of data from a URL request to GitHub / GitLab, and return the <title> if it is found:

import requests
import re

def get_title(url):
    # Make a GET request to the URL with streaming enabled
    response = requests.get(url, stream=True)

    # Set up a regular expression to match the opening and closing tags of the title element
    title_regex = re.compile(r'<title>(.*?)</title>')

    # Initialize the partial title to an empty string and a flag to track whether the opening tag has been found
    # Iterate through the response content
    content = ""
    for chunk in response.iter_content(chunk_size=1024):
        # Decode the chunk as UTF-8 and add to our header string
        content += chunk.decode('utf-8')

        # Check if we've found a match
        match = title_regex.search(content)
        if match:
            # If we find a match then return it and we're finished
            return match.group(1)
        elif "</head>" in content:
            # If we have this, it means we're done with header content
            return None

Then we can call it with a few example links to show how it works:

issues = [
    "https://github.com/pydata/pydata-sphinx-theme/issues/1082",
    "https://gitlab.com/gitlab-org/gitlab/-/issues/386024",
]
titles = []
for issue in issues:
    titles.append(get_title(issue))

Results in:

[
    'Sidebar font changes and toggle icon shows up on too-wide screens · Issue #1082 · pydata/pydata-sphinx-theme ·
GitHub',
    'Tasks To-Do appears to have an off by one error (#386024) · Issues · GitLab.org / GitLab · GitLab'
]

So in our code, we could call this function on any GitLab / GitHub issue or PR link that we discover without making it too much more complex. The only tricky bit might be figuring out how to limit the API calls (e.g. with a local cache, or by letting people give a token).

References

I think this is an example of a reference having its text replaced by the title of the thing it points to, which would act similarly:

https://github.com/executablebooks/MyST-Parser/pull/678/files#diff-6bacfebc6d9bcb771681220c7c5da9100c8cc8b1a9a9d59fe86d07bb6647c474R243-R254

(cc @chrisgorgo thanks for the inspiration)

I'm onboard for this one but do we also want to implement the "open", "close" behavior (green->pruple) ?

I mean, I think it would be pretty cool :-) might be more complex though. My feel is that our target audience for this theme are people who would appreciate a UX that is very similar to GitHub, so it sounds great to me

-1 on this. IMO the value of the current setup is shortening. PR titles are in many cases longer than their URLs. I also personally find it is often harder to read/parse. Compare:

This was originally addressed in #XXXX but it turned out not to fix all cases
This was originally addressed in mne-tools/mne-python/#XXXX but it turned out not to fix all cases
This was originally addressed in [MRG] Add support for a callable in the combine argument of TFR plots but it turned out not to fix all cases

To me in this case 1 is easiest to read, but 2 is preferable because sometimes projects cross-link and I think the reading slowdown is minimal. 3 is clearly worst IMO.

Interesting, I have the opposite ordering than you, I think because I'm thinking about it through a different lens.

I think about it in terms of the "information to characters ratio". AKA, how much information can you convey with how many characters. While a shorter link has fewer characters (e.g. #232), it also has far less information than [MRG] Add support for a callable in the combine argument of TFR plots.

In my mind, the goal of linking to an issue is to provide context for some point. However, the context is only useful if you know what is inside the issue. To that extent, my goal is to minimize the amount of actions that a reader must take in order to gain that context. If they can understand the context simply be looking at a link, then they can quickly understand the point without needing to dig deeper. That's why I really like it when people use the "link title auto-populating" functionality. It is more verbose, but the extra information you get is worth it (to me).

🤷🏻 IMO, in many cases, it's enough to know "there's an existing issue or PR about this". Some might want/need to click through, others won't. I agree that there could be value in a high-level summary of what is in that issue/PR, that would allow some fraction of those people to not click through who otherwise would have if it were just a number. But for that to work it must be the case that the issue/PR title is a good/accurate summary; often it is not because issue/PR authors were lazy or did a bad job, other times because the topic drifted or the real issue turned out to be something different and the title was never updated, other times because a tangentially related point came up in someone's offhand comment, and it's that off-topic comment that is being linked to. In that last case, the issue/PR title can actually be misleading, because I think it will say [MRG] Add support for a callable in the combine argument of TFR plots (Comment) but the comment is actually something like "this reminds me that we never added a combine argument to that other unrelated function like we planned to".

Anyway, I've said enough. If you're not convinced, please just make it opt-in so that users can pick either option 2 or option 3 (as numbered in my prev comment)

I think we just see it differently. If we don't have consensus, then I suggest we leave this issue open for a while to see if a clear majority preference comes out from others. If not then we can either close this or, if implemented, make it a feature flag to turn on.

sounds good. I see your point that "more information is better" and (I think?) you see my point that "the information in a PR title isn't always accurate/relevant". I think where we differ is how bad (or how likely) it is to sometimes provide inaccurate/irrelevant information.

just to add something here, the github logic is more sophisticated than that:

if the link is fully inlined then the shortening is performed as we do it (#xxx)
if the link is the sole content of a list item, it is styled with the title and the open/close icon

By doing so they match both your requirements, making it short when inforamtion are not needed and long when it's all alone. I don't know if we can access this granularity while parsing our files with the html builder. but what would you think if we try ?

By doing so they match both your requirements, making it short when inforamtion are not needed and long when it's all alone. I don't know if we can access this granularity while parsing our files with the html builder. but what would you think if we try ?

By this do you mean:

If it's a GitHub link that is part of a list item, then try to replace the link w/ its title.
If it's not, then just do link shortening like we do now.

?

If so, I think it's a nice compromise and has the added benefit of "we do things exactly the same way that GitHub does". I am also assuming that GitHub invested considerable UI/UX research into this as the pattern they use.

By this do you mean:

If it's a GitHub link that is part of a list item, then try to replace the link w/ its title. If it's not, then just do link shortening like we do now. ?

not only part of a listItem, the only content eg:

- have a look to #646 for more information
- "#1099 create additional test"

(I used code block to avoid quoting our own issues)

Ah I see so, this would trigger the "title lookup":

- https://github.com/pydata/pydata-sphinx-theme/issues/1085

And this would not:

- See https://github.com/pydata/pydata-sphinx-theme/issues/1085

Right? I think if this is very easy to implement it'd be worth a shot, though I don't know how often it'd be used if it had to be the only content in a list item. Maybe the test could instead be "if there is any content after the link, then do not expand the title, but if there is content only before the link it may still be expanded."

Though in general, my preference is to just mimic whatever GitHub does since that is what people are more familiar with and we don't have to explain the UX edge-cases. I don't feel super strongly about it though.

That's not how it works for me, though:

In both cases the url was replaced with the title, even when it was not the sole content of the list item.

Yeah GitHub will expand the title if the link is in any list item regardless of it's position.

BTW, I personally agree with @drammock that I often do not want the full title but just a short link. Of course it depends a lot on what type of content you are writing. If you just want to list a couple of issues, then seeing the full titles can be super helpful, but if you already have a paragraph or bullet point explaining something and then just want to add the relevant link, having the full title is IMO often unnecessary noise. In fact, I am regularly annoyed by this feature in github and sometimes even do the effort of creating the short link myself .. ([#1085](https://github.com/pydata/pydata-sphinx-theme/issues/1085)). I would actually like it if github used to rule of "only replace with title if it is the sole content of a bullet point" (maybe except for recognized words like "closes").

Anyway, just to say that I think this would be a really nice feature, but ideally configurable (or controllable by the exact content, like @12rambau proposed)

Since some folks aren't a fan of the way GitHub does it anyway, could we go with a compromise of:

only if the link is the final content in a list item, expand the title.

Then it would cover cases like

- <link>

And

- see <link>

But not

- in <link> we discuss this

As an alternative, maybe consider not showing the title in the text body, but revealing it when hovering over the issue/PR link. This is what GitLab does. Random example:

(This is from https://gitlab.com/lilypond/lilypond/-/merge_requests/2043.)

Also, instead of fetching the title during the build, why not do it dynamically in JS? That would keep the docs buildable offline.

My only concern with adding too much hovering/tooltips and the like is their accessibility. These components work with a mouse by default, but we cannot rely only on hover/mouse interactions.

To address these concerns, we would need to a) Ensure we are using the correct ARIA and b) Need to add additional JS to handle the opening/close of the summary card so that the default focus & active states are not obscured or replaced.

(This is from gitlab.com/lilypond/lilypond/-/merge_requests/2043.)

For example, this linked tooltip is completely inaccessible through the keyboard as it relies on mouseover events.

Also, instead of fetching the title during the build, why not do it dynamically in JS? That would keep the docs buildable offline.

Similarly, while some good cases in which JS helps/supports accessibility, there are also situations in which relying on too much JS can hurt accessibility. So I would like to understand better how do you anticipate generating these with JS would work @jeanas.

pydata / pydata-sphinx-theme