Closed tillcash closed 4 months ago
The simplify_html
filter takes in a feed whose entries may contain excessive html tags in the body and strip those tags away. I'm supposing it's mostly useful if the body was fetched using full_text
filter, but simplify_html
and full_text
, in my humble opinion, refer to two independent way to processing the feed.
Sorry I may have misunderstood your question here. I'm a bit confused what you mean by "Mentioning only simplify_html on the URL parameter"? Could you elaborate a bit?
Oh, by the way, the wiki should be open for everyone to edit. Please feel free to edit the wiki directly if you see fit.
The simplify_html filter takes in a feed whose entries may contain excessive html tags in the body and strip those tags away.
I think this information is missing in the wiki, which causes confusion. I tried a couple of feeds that do not have full content, like 127.0.0.1:4080/otf?source=https://www.thehindu.com/sci-tech/health/feeder/default.rss&limit=1&simplify_html
, and it does nothing since I didn't know it strips the HTML tags from the feed content.
So, I opened this issue to suggest that simplify_html
should auto-run full_text
to provide the full content. Will you consider adding a new function that combines both full_text
and simplify_html
for simplicity purposes?
Additionally, I suggest refining the keep_only
/ discard
filter to initially apply only to the title by default.
I have updated the wiki entry for simplify_html. Please provide guidance accordingly.
Thank you for your clarification. I have made some changes on top of your update.
Note that different feed formats have different fields for the content. For Atom the content is sometimes found in the <content>
or <summary>
tag, and for RSS the body is more often found in <description>
. It is a quite messy thing to deal with. From the code and some docs, I used the generic term "body" to avoid confusion.
I believe the
simplify_html
function requires thefull_text
method to work effectively. Mentioning onlysimplify_html
on the URL parameter can cause confusion for end users, as it does not provide the expected output on its own.If it's not possible to modify the function, we can update the wiki to highlight that for readability purposes, both
simplify_html
andfull_text
need to be used together.