pixelschrubber / toc

table of contents (TOC) for typo3
GNU General Public License v3.0
1 stars 3 forks source link

Wrong hierarchy in text elements with multi level headlines #6

Closed fishbone1 closed 1 year ago

fishbone1 commented 1 year ago

If you have a content element with bodytext (like the regular text content type) that contains multiple headlines with different levels, the hierarchy will be wrong.

For instance this text:

<h1>Headline 1</h1>
<h2>Headline 1.1</h2>
<h2>Headline 2</h2>

Will produce this result:

Headline 1
Headline 2
   Headline 1.1

The reason for it is that the code collects headlines level by level inside each content element. E.g. it first collects all h1, then all h2, ... Therefore the hierarchy information is lost:

Classes/Domain/Repository/TOCRepository.php, L 17:

$headlines = array("h1", "h2", "h3", "h4", "h5", "h6");
foreach ($headlines as $tag) {
    foreach ($dom->getElementsByTagName($tag) as $item) {
        $this->tocContent[] = array('level' => (int) str_replace("h", "", $item->tagName), 'content' => utf8_decode($item->nodeValue), 'anchor' => $row['uid'], 'urlhash' => $this->format_uri(utf8_decode($item->nodeValue)));
    }
}

This can be fixed by using XPath instead of getElementsByTagName:

$xpath = new \DOMXPath($dom);
$headlines = $xpath->query('//h1 | //h2 | //h3 | //h4 | //h5 | //h6');
$headlineCount = $headlines->length;
for ($i = 0; $i < $headlineCount; ++$i) {
    $item = $headlines->item($i);
    $this->tocContent[] = array('level' => (int) str_replace("h", "", $item->tagName), 'content' => utf8_decode($item->nodeValue), 'anchor' => $row['uid'], 'urlhash' => $this->format_uri(utf8_decode($item->nodeValue)));
}

I'll provide a pull request.

pixelschrubber commented 1 year ago

Thank you for the pull request, did you notice the configuration in ext_conf_template.txt? There are settings, which enable hierarchy and also there is a setting for the layout in the frontend with deeper levels:

cat=Settings//25; type=boolean; label=Show toggle link as plus/minus and toggle deeper hierarchy levels (only works with Toggle-Visibility)

Toggle-Style = 0

cat=Settings//30; type=boolean; label=Enable hierarchy

Enable-Hierarchy = 1

Actually this used to work, not sure if this might be a local issue? Would you be able to check this configuration on your machine?

fishbone1 commented 1 year ago

In my configuration Enable-Hierarchy is on and Toggle-Style is off.

If you have put the headlines to different content elements it works. You can even have a content element with multiple headlines, if you don't put a higher level headline (e.g. h1) below a lower level one (e.g. h2) inside of a content element.

So this works:

---- content element 1: ----
h1
    h2
    h2
----------------------------
---- content element 2: ----
h1
    h2
----------------------------

But this doesn't work:

---- content element 1: ----
h1
    h2
    h2
h1 (a higher level following a lower level won't work: )
    h2 (this will be placed as child entry of the first h1 above)
----------------------------

So maybe this is why you haven't noticed it before.

As I said before: The problem is caused by the fact that headlines are collected level by level:

$headlines = array("h1", "h2", "h3", "h4", "h5", "h6");
foreach ($headlines as $tag) {
    foreach ($dom->getElementsByTagName($tag) as $item) {

First you push all h1 elements to $this->tocContent, then all h2 elements, etc.:

        $this->tocContent[] = array('level' => (int) str_replace("h", "", $item->tagName), 'content' => utf8_decode($item->nodeValue), 'anchor' => $row['uid'], 'urlhash' => $this->format_uri(utf8_decode($item->nodeValue)));

So the order h1, h2, h2, h1, h2 will be h1, h1, h2, h2, h2 is tocContent. The method enableHierarchy() uses this array as its source later:

protected function enableHierarchy()
{
    if ($this->extensionSettings['Enable-Hierarchy']) {

        // Reindex to avoid starting with 0, otherwise recursion won't work
        $this->tocContent = array_combine(range(1, count($this->tocContent)), array_values($this->tocContent));

        foreach ($this->tocContent as $key => $value) {
fishbone1 commented 1 year ago

Maybe it also gets more clear in the pull request #7 . It's just a small fix in TOCRepository

pixelschrubber commented 1 year ago

Thanks for the clarification, sorry for the misunderstanding. This makes totally sense, it used to be a QA-measurement to force writers making use of single content elements, due to a multi-language setup and the workflow for each article. I will merge the pull request shortly