synzen / MonitoRSS

MonitoRSS RSS bot (formerly known as Discord.RSS) with customizable feeds. https://monitorss.xyz
https://monitorss.xyz
MIT License
1.07k stars 238 forks source link

html to markdown: The `*` character is not escaped #376

Closed m417z closed 3 months ago

m417z commented 3 months ago

Feed URL https://mods.windhawk.net/updates.atom

Describe the issue I have a feed item with HTML content similar to:

Some text '*' some more text <code>C:\Programs\Firefox-Version-*\firefox.exe</code>.

When converting to markdown, the result should be similar to:

Some text '\*' some more text `C:\Programs\Firefox-Version-*\firefox.exe`.

But the * character isn't escaped and so instead of:

Some text '*' some more text C:\Programs\Firefox-Version-*\firefox.exe.

The result is:

Some text '' some more text `C:\Programs\Firefox-Version-\firefox.exe`.

synzen commented 3 months ago

Apologies for the late response. I believe, at this time, this issue may be better suited to be resolved by custom placeholders since escaping markdown characters may be tricky/pose more problems than it solves given that those characters may also be in URLs in the article content. If you use custom placeholders with the below configuration:

image

This should resolve the issue

m417z commented 3 months ago

Thanks. I think it's the html to markdown functionality's responsibility to do the correct conversion, but it's up to you. As you mentioned, managing it manually is error prone.