trentm / python-markdown2

markdown2: A fast and complete implementation of Markdown in Python
Other
2.66k stars 433 forks source link

Include HTML headers in TOC #538

Closed Crozzers closed 11 months ago

Crozzers commented 11 months ago

This PR closes #537 by adding the capability to parse <h[1-6]> tags and include their IDs in the table of contents.

It works by matching HTML header tags, checking for an id= attribute and then adding an entry to the TOC. If the tag does not have an ID, a new one is generated and inserted into the HTML.

Since HTML content is hashed before markdown headers are processed, this step intercepts headers in _hash_html_block_sub before they get hashed. This means that TOC entries are inserted out of order. To fix this, I added a function that sorts the TOC by order of appearance. It works by taking the TOC header entry, searching the text for that header and returning the index at which it is found.

This new headers behaviour is disabled by default, and can be enabled using the new header-ids options dict.

# new API
markdown2.markdown(text, extras={'header-ids': {'mixed': True, 'prefix': 'my-prefix'}})
# old API
markdown2.markdown(text, extras={'header-ids': 'my-prefix'})  # converted to {'prefix': 'my-prefix'} in __init__
nicholasserra commented 11 months ago

Thank you! I'm gonna do a release next week.