ttscoff / mdless

MIT License
871 stars 39 forks source link

Codeblocks are broken when having # in the text #80

Open ehvs opened 2 years ago

ehvs commented 2 years ago

When having # inside a code block, it reads as a header tag instead of ignoring it. Eg. Below code block: ``` # TYPE apiserver_audit_requests_rejected_total counter apiserver_audit_requests_rejected_total 0 ```

Is shown as:

TYPE apiserver_audit_requests_rejected_total counter

apiserver_audit_requests_rejected_total 0

When it should show as:

--[ code ]----------------------------------------
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
----------------------------------------------------
Forside commented 1 year ago

I like to support this bug report with additional information. I'm using mdless in the WSL where I have set LC_ALL to C.UTF-8. Comments inside code blocks were working fine up until (including) v1.0.21. From v1.0.22 on, comments are not rendered correctly.

Additionally I found for code blocks where the syntax language is not found:

Test Markdown script:

# Test

```sh
# shell comment
echo *'Hello World'*
# shell comment
echo *'Hello World'*

GitHub rendering:

# Test

```sh
# shell comment
echo *'Hello World'*
# shell comment
echo *'Hello World'*
mdless: v1.0.21 v1.0.22
mdless-code-comment-bug-1 mdless-code-comment-bug-2
ttscoff commented 1 year ago

Oh interesting. Having that version reference where it went wrong might make it easier to track down in a codebase I haven’t touched for a while. Thanks for the report, I’ll try to look into it soon.

On Wed, Jan 11, 2023 at 8:40 PM Jonas Hülsermann @.***> wrote:

I like to support this bug report with additional information. I'm using mdless in the WSL where I have set LC_ALL to C.UTF-8. Comments inside code blocks were working fine up until (including) v1.0.21. From v1.0.22 on, comments are not rendered correctly.

Additionally I found for code blocks where the syntax language is not found:

  • comments are rendered as Markdown titles, with only the first word and the '=' characters being black-on-white
  • before and after v1.0.22, bold and italic text is rendered inside the code block, but again only the first word

Test Markdown script:

Test

sh# shell commentecho *'Hello World'* asdf# shell commentecho *'Hello World'*

GitHub rendering: Test

shell commentecho 'Hello World'

shell comment

echo 'Hello World'

mdless: v1.0.21 v1.0.22 [image: mdless-code-comment-bug-1] https://user-images.githubusercontent.com/21068240/211962920-31e62bf9-85a8-4044-afb3-e6795ebf75b6.png [image: mdless-code-comment-bug-2] https://user-images.githubusercontent.com/21068240/211962944-ccaea986-aab6-4655-aca3-8e9e5add2985.png

— Reply to this email directly, view it on GitHub https://github.com/ttscoff/mdless/issues/80#issuecomment-1379739531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALVWI3UBOGJ7C37RAFJQLWR5VKHANCNFSM6AAAAAAQJPOG6Y . You are receiving this because you are subscribed to this thread.Message ID: @.***>

tkapias commented 1 year ago

Same issue for me with v1.0.35, code blocks are full of wrong background colors and visible escape codes.

ttscoff commented 1 year ago

Please try installing 1.0.37 and see if the issue is resolved.

tkapias commented 1 year ago

Thank you, the issues with apparent escape sequences and with the # are resolved.

Forside commented 1 year ago

Can confirm the biggest issues are solved 👍🏽. However there are still some minor issues that I tried to pinpoint as much as possible:

Also tested with --no-color to make sure color doesn't break anything. Result was the same output just without colors.

Regarding code blocks my main question would be, why is Markdown formatting applied to the blocks at all? Maybe this can be avoided entirely, I'm not familiar with the code.

Test markdown script:

# Title 1

# # Title 2.1

# Title # 2.2

Title 3.1
===

Title 3.2
---

# Title 4.1
===

# Title 4.2
---

---

---
---

===
===

```sh
# *comment 1*
# comment 2
===
# comment 3
===
===
# comment 4
===
---
echo ***'Hello World'***
echo ***'Hello World'***
===
# *comment 1*
# comment 2
---
# comment 3
---
---
# comment 4
---
===
echo ***'Hello World'***
echo ***'Hello World'***
---

### GitHub rendering:

# Title 1

# # Title 2.1

# Title # 2.2

Title 3.1
===

Title 3.2
---

# Title 4.1
===

# Title 4.2
---

---

---
---

===
===

```sh
# *comment 1*
# comment 2
===
# comment 3
===
===
# comment 4
===
---
echo ***'Hello World'***
echo ***'Hello World'***
===
# *comment 1*
# comment 2
---
# comment 3
---
---
# comment 4
---
===
echo ***'Hello World'***
echo ***'Hello World'***
---

mdless:

mdless

ttscoff commented 1 year ago

I'll do some testing on these cases when I have time. My first question would be why you would mix ATX headers with Setext headers at all...

And in answer to why Mardown formatting is applied to code blocks: mdless reads a file line by line and applies formatting. Code blocks are removed from processing before this happens and should just be re-inserted with syntax highlighting when it's complete. There may be a difference in how indented code and fenced code are rendered in the order, though, I'll have to look into it.

-Brett

On 26 Sep 2023, at 19:23, Jonas Hülsermann wrote:

Can confirm the biggest issues are solved 👍🏽. However there are still some minor issues that I tried to pinpoint as much as possible:

  • The first word in MD style italic or bold formatted texts is still formatted inside code blocks with unknown syntax.
    • Same for code blocks with known syntax in lines starting with #. Here however it formats the whole italic or bold part.
  • Heading lines starting with one or more # that contain another # somewhere in the line are not formatted as a heading. Instead the line is printed as plain text.
  • --- and ===, which formats text one line above it as a heading, breaks on lines already marked as a heading by #, both outside and inside code blocks:
    • To apply the formatting, mdless seems to add # or ## in front of the line to format the line as a heading afterwards. As described above, the line is then printed as plain text because of the additional #.
  • --- followed by another --- formats the first line as a heading instead of printing two horizontal bars.
  • For two consecutive === it is correct behaviour to format the first === as a heading.
  • In code blocks with unknown syntax, the inner text is indented by one more space.
  • The ending bar of a code block might preferably be printed right after the code without the extra empty line?

Also tested with --no-color to make sure color doesn't break anything. Result was the same output just without colors.

Regarding code blocks my main question would be, why is Markdown formatting applied to the blocks at all? Maybe this can be avoided entirely, I'm not familiar with the code.

Test markdown script:

# Title 1

# # Title 2.1

# Title # 2.2

Title 3.1
===

Title 3.2
---

# Title 4.1
===

# Title 4.2
---

---

---
---

===
===

```sh
# shell *comment 1*
# shell comment 1
===
# shell comment 2
===
===
# shell comment 3
===
---
echo ***'Hello World'***
echo ***'Hello World'***
===
# shell *comment 1*
---
# shell comment 2
---
---
# shell comment 3
---
===
echo ***'Hello World'***
echo ***'Hello World'***
---

### GitHub rendering:

# Title 1

# # Title 2.1

# Title # 2.2

Title 3.1
===

Title 3.2
---

# Title 4.1
===

# Title 4.2
---

---

---
---

===
===

```sh
# shell *comment 1*
# shell comment 1
===
# shell comment 2
===
===
# shell comment 3
===
---
echo ***'Hello World'***
echo ***'Hello World'***
===
# shell *comment 1*
---
# shell comment 2
---
---
# shell comment 3
---
===
echo ***'Hello World'***
echo ***'Hello World'***
---

mdless:

mdless

-- Reply to this email directly or view it on GitHub: https://github.com/ttscoff/mdless/issues/80#issuecomment-1736469825 You are receiving this because you commented.

Message ID: @.***>