jina-ai / reader

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/
https://jina.ai/reader
Apache License 2.0
6.99k stars 551 forks source link

Strikethrough text not converted to Markdown #120

Closed mquandalle closed 3 weeks ago

mquandalle commented 1 month ago

When converting a webpage to Markdown, the strikethrough text information is lost:

HTML:

Price: <span style="text-decoration: line-through;">80€</span> 70€

Current Markdown output:

Price: 80€ 70€

Desired Markdown output:

Price: ~~80€~~ 70€

This preserves the strikethrough formatting, which is important for pricing information and content editing. It also ensures correct interpretation by LLMs.

nomagick commented 1 month ago

Hi @mquandalle. Currently, Reader only works on the HTML tags level. It does not look into the rendered CSS properties of each element. To get a strikethrough in markdown, there needs to be a pair of corresponding <del></del> tags in the HTML. This is also the defined behavior of the strikethrough syntax, which is part of GFM.