miniflux / v2

Minimalist and opinionated feed reader
https://miniflux.app
Apache License 2.0
6.73k stars 711 forks source link

Checkbox for "Search in Content" #2634

Open bitmage opened 4 months ago

bitmage commented 4 months ago

Kudos on creating a project with maintainable code and great documentation! There's a lot being done right here. :+1:

I have a small contribution I would like to make: I would like to add a "Keep" filter that will search the Content, not just the Title. I'm willing to submit a pull request for this, and I'm writing here to give a heads up and gather feedback.

I want to try doing this with minimal changes. The simplest way I see to go about this would be to add a checkbox in the UI for "Search Title" that would go below the Block Rules and Keep Rules. This will have minimal disruption, allowing users to opt in to the new feature.

I see several open issues with similar requests:

Feature request: Block Rules for article body Allow block rules based on website content (wants to potentially use a CSS selector to look at a particular part of the page) Block rules based off of other elements (not just title). (wants to potentially look at the fully loaded page)

The CSS selector and parsing the fully loaded page are a little overkill for my use case, but I'm hoping that a simple checkbox here will expand the usefulness without adding much complexity.

On the code side it seems that the Entry model already supports the Content field. And the isAllowedEntry function would just need to be updated to take into consideration the new checkbox and the Content field. The value of the checkbox would need to be stored and retrieved, and the feature would need to be documented. New test(s) will be written.

Welcoming any feedback or suggestions.

bitmage commented 4 months ago

I added code for the new checkboxes which can be seen here. Unit and integration tests pass.

When I build a local docker container and test the UI manually, I notice:

  1. Using the new feature doesn't error, and doesn't seem to interfere with previous functionality...
  2. But it also doesn't properly search Content. I assume this is because the content field isn't loaded at the time processing is happening?

I'll look into it further, but feel free to drop information here if you have any tips.

bitmage commented 4 months ago

So, while Content wouldn't be loaded as a result of scraping when these filters run, I believe it should be loaded due to the code in rss/adapter.go grabbing it directly from the description field in the RSS feed. I assume this is the adapter that would be run for an RSS feed like Webflow's Discourse.

Still investigating why my code doesn't appear to be searching the Content.