mixmark-io / turndown

🛏 An HTML to Markdown converter written in JavaScript
https://mixmark-io.github.io/turndown
MIT License
8.52k stars 864 forks source link

Very simple preformatted code block does not get converted properly #435

Open dddoyle1 opened 1 year ago

dddoyle1 commented 1 year ago

I have a very simple element:

<pre>
<code class="">
fruits = ["apple", "banana", "cherry"]

for fruit in fruits:
  print(fruit)
</code>
</pre>

that I would like converted to a fenced code block via TurndownService::turndown

I initialize the TurndownService with

var turndownService = new TurndowSnService({codeBlockStyle: "fenced"});

However, turndown yields the following string when I pass my element:

"fruits = [\"apple\", \"banana\", \"cherry\"]\n\nfor fruit in fruits:\n  print(fruit)"

I've verified by hand that the fencedCodeBlock.filter returns true when passed my element and that fencedCodeBlock.replacement returns the expected string sandwiched by the fence, but this behavior is not reproduced as expected when using TurndownService::turndown

domchristie commented 1 year ago

This looks like it's working on the live demo: https://mixmark-io.github.io/turndown/ The output is:

```

fruits = ["apple", "banana", "cherry"]

for fruit in fruits:
  print(fruit)
```
dddoyle1 commented 1 year ago

Looks like this happens only when I pass the HTML element directly, whereas if I instead do

turndownService.turndown(element.outerHTML)

the element is converted as I expected. Is this intended behavior?

FWIW, I see that when I pass the element directly, the thing that makes it to fencedCodeBlock.filter is a node with "\

...\
" stripped away, ie a code node, which fails the filter criteria

XinBaoCode commented 5 months ago

The codeBlockStyle option will do this for you. For example:

var turndownService = new TurndownService({ codeBlockStyle: 'fenced' })
turndownService.turndown('<pre><code class="language-js">console.log("hello world")</code></pre>')