Open Mitecon opened 1 year ago
This is something I've been seeing regardless of version. It doesn't matter which version - stable or devbuild - it's consistent.
I've been doing some testing. I'll attach one of the same databases that I attached in the other issue:
database.pre-populated.general.news.feeds.100.labels.zip
I don't need to send you the entire RSS Guard directory since I've done the following both with the existing config and after deleting the whole directory and either way has the same outcome. So it's not dependent upon a particular config.
This database contains:
28573 articles
Steps:
Exited RSS Guard
Deleted whole 'RSS Guard 4' directory
Started RSS Guard to create necessary directories and files
Exited RSS Guard
Deleted the database and copied over the pre-populated one
Started RSS Guard again
Everything (category) has 28573 articles.
With 'General News Filters' filter selected and 'Everything' category selected:
Test: 7s
Process checked feeds: 5m 9s
BBC News (feed) has 3642 articles.
With 'BBC News (purge unwanted)' filter selected and only 'BBC News - Home (feed)' selected:
Test: <2s
Process checked feeds: 35s
Ars Technica (feed) has 850 articles
With 'Ars Technica' filter selected and only 'Ars Technica - All content (feed)' selected:
Test: <1s
Process checked feeds: 9s
It's probably not that important but I ran the above tests on the database before I added any labels - there were no labels at all during these tests. With labels, I don't know what the effect might be. I've booted my spare laptop three separate times today to test things. I kept thinking I was done and shut it down, only to remember I wanted to try something else.
So if you go through trying each filter in various ways with different feeds selected and checked you will see what I've been seeing.
Plus, you'll have a somewhat decently populated database to run your own tests on in the future that contains a bit of everything.
Does this perhaps improved in latest dev. build?
I'm now running: rssguard-devbuild-d866378df-linux64.AppImage
This is still the same. I can 'Test' (all lines go green or red) almost instantly but 'Process checked feeds' can still take quite a while - even on smallish feeds with quite simple filters.
For example, continuing to use the database above (database.pre-populated.general.news.feeds.100.labels.zip), run the BBC News (purge unwanted) filter. 'Test' is almost instant while 'Process checked feeds' takes much longer. I'm testing today in my 'live' database but the results are similar. Bear in mind that:
My 'live' database is updating every 15 minutes
The 'BBC News (purge unwanted) filter is always active on the BBC News feed only
Therefore, there should be no articles for the filter to have to process (delete from database) unless the feed is updated with new articles that require it, then they'll be ignored altogether
So when I click 'Process checked feeds' - what is actually happening? Since there is effectively nothing for the filter to do. I understand the filter will still have to iterate over each article, but still... The wait time is exponentially longer.
I see this in KSysMon while doing the above:
You have this database as well so you can run the exact same testing I've done above with:
BBC News [Feed]
BBC News (purge unwanted) [Filter]
The only difference is that my 'live' database has been updated since then, which you can also obviously do with the same database.
Do you not see similar results?
Brief description of the issue
When in the filter dialogue, I can create a filter, select the feed I want to apply it to and then click 'Test'. The preview works very quickly. If I then use the exact same filter but click 'Process checked feeds', the operation takes much longer.
How to reproduce the bug?
If I use the simple filter below, it's easy to see whether articles are matched or ignored with a common word:
If I click 'Test', the filter is applied very quickly. Some articles matching the 'testcondition' word are highlighted in green. The rest of the unmatching articles are highlighted in red.
Now, if I click 'Process checked feeds', the filter will take a much longer amount of time to do the same thing as above.
What was the expected result?
Processing filters should take relatively the same short amount of time as testing.
What actually happened?
As stated above, processing filters 'for real' takes much longer than a test preview of the same filter.
Why is there this discrepancy between testing and processing? Surely they both do the same thing? I know processing would actually be writing to the database to apply a filter ID, but still.
What this means in practice is that, when left alone with lots of filters set up, RSS Guard will thrash a CPU thread while running through all all of the filters during a feed update.
Is there a technical reason for this behaviour and can anything be done about it? Maybe it's an sqlite limitation or something and can't be avoided?
Debug log
Not really relevant here.
Operating system and version