Include HTML headers in TOC

This PR closes #537 by adding the capability to parse <h[1-6]> tags and include their IDs in the table of contents.

It works by matching HTML header tags, checking for an id= attribute and then adding an entry to the TOC. If the tag does not have an ID, a new one is generated and inserted into the HTML.

Since HTML content is hashed before markdown headers are processed, this step intercepts headers in _hash_html_block_sub before they get hashed. This means that TOC entries are inserted out of order. To fix this, I added a function that sorts the TOC by order of appearance. It works by taking the TOC header entry, searching the text for that header and returning the index at which it is found.

This new headers behaviour is disabled by default, and can be enabled using the new header-ids options dict.

# new API
markdown2.markdown(text, extras={'header-ids': {'mixed': True, 'prefix': 'my-prefix'}})
# old API
markdown2.markdown(text, extras={'header-ids': 'my-prefix'})  # converted to {'prefix': 'my-prefix'} in __init__

trentm / python-markdown2

Include HTML headers in TOC #538