hlxsites / prisma-cloud-docs-website

blocks and gdoc authored content for https://docs.prismacloud.io
Apache License 2.0
3 stars 2 forks source link

[AsciiDoc conversion] Stray id text 2 #205

Closed iansk closed 5 months ago

iansk commented 10 months ago

This was previously reported in #171. We're seeing a different instance of it. This time it's in the middle of the page.

Repro:

  1. Open a new browser tab and go to https://main--prisma-cloud-docs-website--hlxsites.hlx.live/en/enterprise-edition/content-collections/administration/configure-iam-security/remediate-alerts-for-iam-security

  2. In the page, search for id2cbf5c9b-62aa-4a95-9340-eeaaf6f07bc4

franklin-uat-stray-id2

Notes:

I tried republishing the preview, but it didn't change anything. You'll notice the last updated date at the top of the page is October 12, 2023)

curl -vi -XPOST 'https://admin.hlx.page/preview/hlxsites/prisma-cloud-docs/main/docs/en/enterprise-edition/content-collections/administration/configure-iam-security/remediate-alerts-for-iam-security'

Preview page:

https://main--prisma-cloud-docs-website--hlxsites.hlx.page/en/enterprise-edition/content-collections/administration/configure-iam-security/remediate-alerts-for-iam-security

iansk commented 10 months ago

On the same page, here's another example. Search for ide69e3eac-d058-4804-8d58-8e648893a030

3vil3mpir3 commented 10 months ago

@maxakuru it seems that H3s aren't being processed correctly, thought H4s seem to work as expected. Evidence:

maxakuru commented 10 months ago

this is actually caused by nested sections with IDs on their headings.. we use section metadata to define the section's IDs so they can't be nested otherwise we end up with 2 section-metadata tables and only the first is applied. The visible section-metadata table is the 2nd one that isn't removed since it is treated as a normal block.

@iansk I believe in this case removing the table from the procedure will fix the problem

iansk commented 6 months ago

Current pages with this issue

Root cause analysis

In the worker's section converter (adoc2html), the section-metadata div is added to the wrong place when there are nested sections. In AsciiDoc, H3's are nested in H2's, and H4's are nested in H3's.

The current algorithm for the section converter:

  1. Generates the content, which is the opening <div>, the <hX> tag with the title.

  2. Calls convert on all remaining blocks. If any block is a nested section, the section converter is called again.

  3. After all the conversions are done

    • content is wrapped in a div.
    • Section metadata div is generated
  4. Returns content + sectionMeta

Because the sectionMeta div isn't processed and appended until after all nested sections are converted, it's added to the wrong place in the HTML tree when there are nested sections, with a custom ID.

For example:

Input:

== Test2

Test2.

[#custom-sect2]
=== Sect2

Test

[#custom-sect3]
==== Sect3

Test

Output from current code:

<div>
  <h2 id="test2">Test2</h2>
  <p>Test2.</p>
</div><div>
  <h3 id="sect2">Sect2</h3>
  <p>Test</p>
</div><div>
  <h4 id="sect3">Sect3</h4>
  <p>Test</p>
  <div class="section-metadata">
    <div><div>id</div><div>custom-sect3</div></div>
  </div>
  <div class="section-metadata">
    <div><div>id</div><div>custom-sect2</div></div>
  </div>
</div>

When the page is later loaded, decorateSections() is supposed to process the section-metadata div and remove it. However, the code only looks for one div.section-metadata per section, so it only removes one div. The other div is left on the page, and when the page renders, it appears as para text, like this:

id
custom-sect2

It also means that the id for Sect2 isn't set at all.

Fix

To fix the issue, I reordered the logic. Now, the section converter immediately generates the sectionMeta div and adds it to the section content. From there, the first part of the section content is generated (heading tag, title, etc), and the converter is called on the remaining blocks (including an nested sections). This way, a single section-metadata is guaranteed to be associated with each section.

Proposed fix here: https://github.com/hlxsites/prisma-cloud-docs/pull/455