Make Github TOCs compatible with Confluence

zendern commented 2 months ago

Usecase

I generate a Table of Contents that works in Github rendered Readmes but when uploaded to Confluence it does not work since there are a handful of cases that make them not find the right heading.

Problem

Biggest one is that links are case sensitive. Most generators doctoc or chatgpt or whatever just make everything lowercase b/c that is easiest to make things consistent. Confluence on the other hand is case sensitive with links. https://support.atlassian.com/confluence-cloud/docs/insert-links-and-anchors/#Create-and-insert-links

Links are case sensitive
:, (, and ) characters need to be url encoded
:- ends up being turned into :
`` and*` characters are ignored completely and removed.

Some examples

github TOC entry	what confluence needs to work
- [How to use `CUSTOM_DEF` parameters](#how-to-use-custom_def-parameters) \| `#How-to-use-CUSTOM_DEF-parameters`
`- [Define custom naming (for Thing)](#define-custom-naming-for-thing)` \| `#Define-custom-naming-%28for-Thing-2%29`
`- [Next section: Inputs and outputs](#next-section-inputs-and-outputs)`	`Next-section:Inputs-and-outputs`

My hack today

Bash script

I run this bash script to find all the TOCs that are generated (in my case if it has the `doctoc` start/end comments) and replace all of them accordingly to match and work on confluence before uploading them. Probably could make it smarter and use those or if we find a list with a link in it then we can assume its a TOC and process it. ```bash #!/bin/bash # Script to modify the TOC links in markdown files to be compatible with Confluence. # Details on links in confluence can be found here https://support.atlassian.com/confluence-cloud/docs/insert-links-and-anchors/#Create-and-insert-links # Couple things to note that are different than github TOC # * Links are case sensitive # * :, (, and ) characters need to be url encoded # * :- is treated as : # * ` and * characters are ignored # Function to URL encode specific characters url_encode() { echo "$1" | sed -e 's/:/%3A/g' -e 's/(/%28/g' -e 's/)/%29/g' } # Get the directory name from the command line argument directory_name="$WORKING_DIR" # Check if the directory name is provided if [ -z "$directory_name" ]; then echo "Please provide a directory name." exit 1 fi # Update the base directory with the provided directory name base_dir="$directory_name" echo "Processing $base_dir" # Find all markdown files in the specified directory md_files=$(find "$base_dir" -type f -name "*.md") # Define the TOC markers toc_start_marker="" toc_end_marker="" # Process each markdown file for md_file in $md_files; do echo "Processing $md_file..." # Read the contents of the file lines=$(cat "$md_file") # Flag to track when we are inside the TOC inside_toc=false # Temporary file to store updated lines temp_file=$(mktemp) # Process lines and update the TOC while IFS= read -r line; do if [[ "$line" == *"$toc_start_marker"* ]]; then inside_toc=true echo "TOC found in $md_file" fi if $inside_toc && [[ "$line" =~ ^[[:space:]]*-[[:space:]]*\[.*\]$#.*$ ]]; then # Extract the link text without brackets using awk link_text=$(echo "$line" | awk -F'[][]' '{print $2}') # Replace spaces with hyphens new_id=$(echo "$link_text" | sed 's/ /-/g') # Remove ` and * characters new_id=$(echo "$new_id" | sed 's/[\`*]//g') # Replace :- with : new_id=$(echo "$new_id" | sed 's/:-/:/g') # URL encode : ( ) characters new_id=$(url_encode "$new_id") # Construct the updated line with new_id updated_line=$(echo "$line" | sed -E "s/$#.*$/(#$new_id)/") echo "Original toc line: $line" echo "Updated toc line: $updated_line" echo "$updated_line" >> "$temp_file" continue fi echo "$line" >> "$temp_file" if [[ "$line" == *"$toc_end_marker"* ]]; then inside_toc=false echo "end of TOC found for $md_file" fi done <<< "$lines" # Write the updated lines back to the file mv "$temp_file" "$md_file" echo "TOC links updated successfully in $md_file." done echo "All markdown files processed." ```

Feature Request

Would like to not have to do this hack and instead md2conf be smart enough to process it for me and do some processing on the inputs to update any TOC type links to work with Confluence.

PS: Thanks for creating this tool btw it works great 🎉 🎉 🎉

zendern commented 2 months ago

Another example TOC generator that can also be used that has the following comments. If there was want to go after those comments.

https://github.com/trussworks/pre-commit-hooks?tab=readme-ov-file#markdown-toc which uses https://www.npmjs.com/package/markdown-toc under the covers.

<!-- toc -->
...
<!-- tocstop -->

hunyadi commented 2 months ago

What I could envision as a relatively robust solution would be to scan the XHTML document produced by the Markdown converter, generate both Confluence-compliant and GitHub-compliant URL fragments for HTML elements <h1> to <h6>, and then whenever a GitHub-compliant fragment is encountered in an HTML anchor, we would use a lookup table to replace the GitHub-compliant link with the corresponding Confluence-compliant link. We would raise an error whenever there is no bijective mapping.

hunyadi commented 1 month ago

One possible solution could be Confluence Storage Format anchors.

The key idea is to set an anchor in the document with the anchor macro:

<ac:structured-macro ac:name="anchor">
    <ac:parameter ac:name="">The_Id</ac:parameter>
</ac:structured-macro>

Typically, we would place this anchor at every heading, and generate the ID using GitHub's title text conversion rules (e.g. use all lowercase).

Next, whenever we encounter a link with a local reference (URL fragment), we would emit a link macro:

<ac:link ac:anchor="The_Id">
    <ac:plain-text-link-body>
        <[CDATA[text describing the link]]>
    </ac:plain-text-link-body>
</ac:link>

This approach could decouple the anchor/link relationship from the specifics of the conversion rules Confluence uses internally to build anchor IDs (e.g. URL-encode, special behavior with punctuation).

hunyadi commented 1 month ago

This has been implemented in the commit titled Auto-generate anchors for section headings to allow GitHub-style same-page links. You need to opt in with the command-line option --heading-anchors.

hunyadi / md2conf