martinrotter / rssguard

Feed reader (and podcast player) which supports RSS/ATOM/JSON and many web-based feed services.
GNU General Public License v3.0
1.58k stars 124 forks source link

[BUG]: Filters take a short time to test but a long time to process #974

Open Mitecon opened 1 year ago

Mitecon commented 1 year ago

Brief description of the issue

When in the filter dialogue, I can create a filter, select the feed I want to apply it to and then click 'Test'. The preview works very quickly. If I then use the exact same filter but click 'Process checked feeds', the operation takes much longer.

How to reproduce the bug?

If I use the simple filter below, it's easy to see whether articles are matched or ignored with a common word:

const label = 'Test Filter';
const testcondition = 'the';

const testconditiontrue = (msg.title.toLowerCase().includes(testcondition));

function filterMessage() {
      if (testconditiontrue) {

    msg.assignLabel(msg.findLabelId(label));
    msg.isRead = false;
    return MessageObject.Accept;
  } else {
    return MessageObject.Ignore;
  }
}

If I click 'Test', the filter is applied very quickly. Some articles matching the 'testcondition' word are highlighted in green. The rest of the unmatching articles are highlighted in red.

Now, if I click 'Process checked feeds', the filter will take a much longer amount of time to do the same thing as above.

What was the expected result?

Processing filters should take relatively the same short amount of time as testing.

What actually happened?

As stated above, processing filters 'for real' takes much longer than a test preview of the same filter.

Why is there this discrepancy between testing and processing? Surely they both do the same thing? I know processing would actually be writing to the database to apply a filter ID, but still.

What this means in practice is that, when left alone with lots of filters set up, RSS Guard will thrash a CPU thread while running through all all of the filters during a feed update.

Is there a technical reason for this behaviour and can anything be done about it? Maybe it's an sqlite limitation or something and can't be avoided?

Debug log

Not really relevant here.

Operating system and version

martinrotter commented 1 year ago
  1. Did you test with latest devbuild?
  2. How many articles?
Mitecon commented 1 year ago

This is something I've been seeing regardless of version. It doesn't matter which version - stable or devbuild - it's consistent.

I've been doing some testing. I'll attach one of the same databases that I attached in the other issue:

database.pre-populated.general.news.feeds.100.labels.zip

I don't need to send you the entire RSS Guard directory since I've done the following both with the existing config and after deleting the whole directory and either way has the same outcome. So it's not dependent upon a particular config.

This database contains:

28573 articles

Steps:

Exited RSS Guard
Deleted whole 'RSS Guard 4' directory
Started RSS Guard to create necessary directories and files
Exited RSS Guard
Deleted the database and copied over the pre-populated one
Started RSS Guard again

Everything (category) has 28573 articles.

With 'General News Filters' filter selected and 'Everything' category selected:

Test: 7s
Process checked feeds: 5m 9s

BBC News (feed) has 3642 articles.

With 'BBC News (purge unwanted)' filter selected and only 'BBC News - Home (feed)' selected:

Test: <2s
Process checked feeds: 35s

Ars Technica (feed) has 850 articles

With 'Ars Technica' filter selected and only 'Ars Technica - All content (feed)' selected:

Test: <1s
Process checked feeds: 9s

It's probably not that important but I ran the above tests on the database before I added any labels - there were no labels at all during these tests. With labels, I don't know what the effect might be. I've booted my spare laptop three separate times today to test things. I kept thinking I was done and shut it down, only to remember I wanted to try something else.

So if you go through trying each filter in various ways with different feeds selected and checked you will see what I've been seeing.

Plus, you'll have a somewhat decently populated database to run your own tests on in the future that contains a bit of everything.

martinrotter commented 1 year ago

Does this perhaps improved in latest dev. build?

Mitecon commented 1 year ago

I'm now running: rssguard-devbuild-d866378df-linux64.AppImage

This is still the same. I can 'Test' (all lines go green or red) almost instantly but 'Process checked feeds' can still take quite a while - even on smallish feeds with quite simple filters.

For example, continuing to use the database above (database.pre-populated.general.news.feeds.100.labels.zip), run the BBC News (purge unwanted) filter. 'Test' is almost instant while 'Process checked feeds' takes much longer. I'm testing today in my 'live' database but the results are similar. Bear in mind that:

My 'live' database is updating every 15 minutes
The 'BBC News (purge unwanted) filter is always active on the BBC News feed only
Therefore, there should be no articles for the filter to have to process (delete from database) unless the feed is updated with new articles that require it, then they'll be ignored altogether

So when I click 'Process checked feeds' - what is actually happening? Since there is effectively nothing for the filter to do. I understand the filter will still have to iterate over each article, but still... The wait time is exponentially longer.

I see this in KSysMon while doing the above:

running-filter-bbc-news

You have this database as well so you can run the exact same testing I've done above with:

BBC News [Feed]
BBC News (purge unwanted) [Filter]

The only difference is that my 'live' database has been updated since then, which you can also obviously do with the same database.

Do you not see similar results?