unisonweb / unison

A friendly programming language from the future
https://unison-lang.org
Other
5.64k stars 267 forks source link

Fix roundtrip bug when sub-docs contain headings #5120

Closed neduard closed 5 days ago

neduard commented 6 days ago

Currently the parser errors out when we have headings inside sub-docs eg:

    x = {{ # Heading A
{{ # Heading B }}
```
}}

Fails with 

I got confused here:

  4 | {{ # Heading B }}

I was surprised to find a # here. I was expecting one of these instead:

I think this is because in Lexer.hs we have a single parentSection :: Int to keep track of the heading levels and to know how many # characters to expect. However, when we enter a sub-doc, that number needs to be saved somehow and reset back to 0.

Overview

Use a List (read: stack) to push a new 0 each time we enter a doc section.

Closes #4476 Closes #4729

Test coverage

I've added three tests in reparses-with-same-hash.u - not sure if they are really necessary but I figured it's better to err on the side of caution.

Loose ends

N/A?

aryairani commented 6 days ago

Hey thanks for this PR @neduard.

I'm not super familiar with the doc parser, and I was wondering what the parentSection field is even for? cc @pchiusano Shouldn't any heading level be supported after any other?

neduard commented 6 days ago

Good question @aryairani , the way I understand it is this is indirectly caused by us parsing markdown into a syntax tree. For example:

f = {{

#### HA

## HB

### HC

}}

parses to:

Open "/home/ed/code/unison/scratch.u"
  WordyId (NameOnly (Name Relative (NameSegment {toUnescapedText = "f"} :| [])))
  Open "="
    Open "syntax.docUntitledSection"
      Open "syntax.docSection"
        Open "syntax.docParagraph"
          Open "syntax.docWord"
            Textual "HA"
          Close
        Close
      Close
      Open "syntax.docSection"
        Open "syntax.docParagraph"
          Open "syntax.docWord"
            Textual "HB"
          Close
        Close
        Open "syntax.docSection"
          Open "syntax.docParagraph"
            Open "syntax.docWord"
              Textual "HC"
            Close
          Close
        Close
      Close
    Close
  Close
Close

or in other words:

> display f

  # HA

  # HB

    # HC

Notice we don't actually store the heading level in the tokens. As such, we need to remember the heading level of the parent such that we know whether to create a sub-section or "close" the current one if that makes sense.

I agree though this might be overkill, so happy to hear alternatives!

aryairani commented 6 days ago

@neduard Ok I see, thanks.