martinrotter / rssguard

Feed reader (and podcast player) which supports RSS/ATOM/JSON and many web-based feed services.
GNU General Public License v3.0
1.63k stars 125 forks source link

Certain Feeds do not Reliably Automatically Update #148

Closed gourdcaptain closed 4 years ago

gourdcaptain commented 7 years ago

Brief description of the issue.

Some feeds in RSSGuard like http://kotaku.com/rss do not reliably automatically update - they will sometimes not update for hours unless manually updated.

How to reproduce the bug?

  1. Add feed to reader.
  2. Set automatic update interval

What is the expected result?

Feed will automatically update when new entries are added to it.

What actually happened?

No updates occur (or only partial ones) until the feed is manually updated with right click -> update selected items.

Other information (logs, see Wiki)

System: Arch Linux 64-bit, RSSGuard 3.5.0.

Logs are currently being looked for - this issue due to being irregularly occurring is hard to reproduce on command. Will post logs once acquired.

danowar2k commented 7 years ago

I'll try the new build now...

danowar2k commented 7 years ago

Before I do, after obtaining new messages, you iterate over the messages and extract title etc. You could output the stuff there for debugging.

martinrotter commented 7 years ago

Looking at your code, if one feed cannot update because the feed can get no thread, the feed will wait until it can secure a thread for itself.

It in fact does wait - but "spinning" approach is used.

don't know if any of the two methods would cause more overhead, though

That's what I thought too. It is probably best to download all messages from channel (you have to download whole XML anyway).

Quick idea: If feeds turn red, an error happened during update. Could you add a context menu item to a red feed that shows the last error message? When you hover your mouse over the feed, you should see some error-related info in tooltip.

Because I didn't understand this, for example. It seems sometimes there are words missing? What is meant by "this" and "it"? Yes, I fucked it up. Le me rephrase.

""" Not currently possible because the logick, which parses individual XML chunks and decodes attachments from it is completely separate from message logic, but you can easily SEE which FEED it the above log lines belongs TO, because parsing of messages is synchronous. So if you see that "Added enclosure..." log entry then search for next "Downloaded XX messages for feed..." log entry and you see which feed enclosures are for. """

Also, I'm not careful enough when I write comments. Me neither.

you know the number of downloaded messages. Can't you iterate over them and display the titles at that moment?

I could, but I added extra debug log entries so that now you will actually see message titles when it matters - when message is added or updated. But if we fail to identify the issue correctly, I will add those debug entries too.

danowar2k commented 7 years ago

Okay, I'm trying the debug build with -log.

Something is strange with the logging. When I start the program, the rssguard.log is there, but Windows 7 explorer says it has 0 bytes. When I open the log in Notepad++, it is empty. Only after waiting a few more seconds and doing F5 to update the explorer the file gains size and I can reload it in Notepad++.

Also strange: I marked a single feed (having 52 fresh unread messages), told RSSguard to "update the selected feeds", waited a bit and reopened the log. The same 52 messages were downloaded, but at that moment only 21 database checks were logged. Only when I updated the feed again (!), the other 31 log messages were written to the file.

Are you doing some delayed flushing when appending to the log? Can't you force flush after each message when the program is started with "-log"?

I could, but I added extra debug log entries so that now you will actually see message titles when it matters - when message is added or updated. But if we fail to identify the issue correctly, I will add those debug entries too.

I hope your debug messages will be enough for now, they have the necessary info to identify downloaded articles, it seems.

Now I'll wait when the error comes up again...

danowar2k commented 7 years ago

We could use some IM communication if you'd prefer.

danowar2k commented 7 years ago

"L?stige Marotten: Was nervt Sie am Handy am allerallermeisten?"

Noticed this. I hope you use UTF-8 everywhere....

danowar2k commented 7 years ago

Instead or waiting for shit to happen in the real world ;-) you could provide a test feed of sorts that you build. I could then test with that...

martinrotter commented 7 years ago

I comitted fix for those unflushed logs.

On, Windows logging to file is sometimes crazy nightmare, but now it is explicitly flushed after each write. Should work better now.

As for encoding. I am using older qDebug("string") API, instead of new qDebug() << "str" API. Older API is more C-like and bit faster and I am used to it so far. Problem of OLD api is that it has some problems with special characters but it is really not important for logging, ASCII is enough.

Looking for your findings.

danowar2k commented 7 years ago

Downloading the new build...

I really think some testing area would help where you generate a RSS feed which has a new item every minute. Their titles would show the generated time.

Then tests could be run against that RSS feed.

Maybe have a few more RSS feeds which test other things (randomly update/edit the same article, have non-ASCII chars somewhere, violate some RSS standard, etc etc)

danowar2k commented 7 years ago

This happened: I had 30 messages in the feed, I updated the feed, then I had 29 messages in the feed. Log:

[rssguard] DEBUG: Checking if message with title 'Katalonien: Ex-Parlamentspr?sidentin kann U-Haft gegen Kaution vermeiden', url 'http://www.spiegel.de/politik/ausland/katalonien-carme-forcadell-droht-u-haft-a-1177294.html#ref=rss' and author '' is present in DB. (17/10/11 10:48:48)
[rssguard] DEBUG: Message with these attributes is already present in DB and has DB ID 1426547112. (17/10/11 10:48:48)
[rssguard] DEBUG: Updating message with title 'Katalonien: Ex-Parlamentspr?sidentin kann U-Haft gegen Kaution vermeiden' url 'http://www.spiegel.de/politik/ausland/katalonien-carme-forcadell-droht-u-haft-a-1177294.html#ref=rss' in DB. (17/10/11 10:48:48)
[rssguard] DEBUG: 1 messages for feed 11 stored in DB. (17/10/11 10:48:48)
[...]
[rssguard] DEBUG: Made progress in feed updates, total feeds count 1/1 (id of feed is 11). (17/10/11 10:48:48)

Then after another update a few minutes later it reappeared...

[rssguard] DEBUG: Checking if message with title 'Katalonien: Ex-Parlamentspr?sidentin kann U-Haft gegen Kaution vermeiden', url 'http://www.spiegel.de/politik/ausland/katalonien-carme-forcadell-droht-u-haft-a-1177294.html#ref=rss' and author '' is present in DB. (17/10/11 10:51:47)
[rssguard] DEBUG: Adding new message with title 'Katalonien: Ex-Parlamentspr?sidentin kann U-Haft gegen Kaution vermeiden' url 'http://www.spiegel.de/politik/ausland/katalonien-carme-forcadell-droht-u-haft-a-1177294.html#ref=rss' to DB. (17/10/11 10:51:47)
[...]
[rssguard] DEBUG: SQLite connection 'feed_upd' is already active. (17/10/11 10:51:47)
[rssguard] DEBUG: File-based SQLite database connection 'feed_upd' to file 'E:\Eigenes\Desktop\rssguard-3.5.5-eb777b9-nowebengine-win64\data\database\local\database.db' seems to be established. (17/10/11 10:51:47)
[rssguard] DEBUG: 1 messages for feed 11 stored in DB. (17/10/11 10:51:47)
[rssguard] DEBUG: Made progress in feed updates, total feeds count 1/1 (id of feed is 11). (17/10/11 10:51:47)

But I think it may just be that they took the article from the feed because something had to be corrected. News...

martinrotter commented 7 years ago

Look at list of messages, do you see that message twice? Does it have same date/title/author? Do they differ in any way?

martinrotter commented 7 years ago

I really think some testing area would help where you generate a RSS feed which has a new item every minute. Their titles would show the generated time.

Then tests could be run against that RSS feed.

Maybe have a few more RSS feeds which test other things (randomly update/edit the same article, have non-ASCII chars somewhere, violate some RSS standard, etc etc)

This would certainly be nice, but I am quite busy, someone would have to write some webservice which would server these testing RSS feeds.

danowar2k commented 7 years ago

Dammit, it doesn't seem to happen when the PC is only a few minutes in energy saving mode. Or maybe activating the log prevents it from happening. Or maybe it only happens when the database has a certain size. Or maybe it only happens when RSSguard has been in memory for a longer time.

Can I use the database from my normal version for debugging? Which files do I have to copy?

danowar2k commented 7 years ago

Look at list of messages, do you see that message twice? Does it have same date/title/author? Do they differ in any way?

No the old message was removed from the article list. Author was none both times, though, I think. I can't compare the two messages because the old one is gone :-(

martinrotter commented 7 years ago

You can copy "database.db" (see About dialog), it should work. Can you send me URL of the feed from the log above?

martinrotter commented 7 years ago

Btw, IM channel for rssguard is IRC channel #rssguard on freenode IRC. I am there when I am at home (usually in evenings when my child is sleeping - I do not turn my home PC workstation on when he is not sleeping). :)

danowar2k commented 7 years ago

Can you send me URL of the feed from the log above?

http://www.spiegel.de/schlagzeilen/tops/index.rss

danowar2k commented 7 years ago

Ok, so I tried forcing the bug with my normal RSSguard 3.5.4 without a log. I went into energy save mode, came back and the feed still updated properly.

Now I have had the program running for a half hour or so. The feed has three new articles but when I manually update it, they aren't added to the article list. As no log is running, I can't see what's happening, of course.

I will now restart the dev build with my usual database and do the same (go into energy save, come back, wait for an hour and try to update).

If this doesn't reproduce the bug, I bet there is something happening after that time or after the energy save that doesn't happen when you activate logging.

I would bet on file handles that are closed and never reopened, or threads dying and not replaced or network connections closed but never checked to reopen them.

Somewhere in the code happens something to stuff in memory where you don't check if things are ready to use, or you have written such fail-safes that the program seems to work but only because you continue without throwing any errors when things aren't there that should be there.

If any uncatched error would be there, the program would produce crashes or a backtrace somewhere, wouldn't it?

danowar2k commented 7 years ago

At the moment it seems it would really help if I could start/stop logging to a file while the program is running...

martinrotter commented 7 years ago

Yes, any uncaught exceptions lead to a crash.

Any network connections for downloading feed XMLs are created and closed only when really needed. Not sure about feed downloader thread, it really could get somehow corrupted when PC wakes up from sleep. I will have to try myself.

  1. You can launch rssguard in command line (without -log option) and redirect its output into file - rssguard.exe > log.txt 2>&1.
  2. Put PC to sleep with RSS Guard and command line running.
  3. Restore from sleep.
  4. Reproduce bug and stop RSS Guard.
  5. Upload log.txt.

Redirecting output to file via the steps above is transparent and does not differ from the normal RSS Guard launch, let's try it.

danowar2k commented 7 years ago

Ok, so using the dev build without "-log", going to energy save, returning from it and then waiting for half an hour or so leads to the same result: the feed isn't updated anymore, even when manually updating.

I'll now try with both methods above, the "-log" method and the redirect stdout and stderr method, using dev build + my database...

EDIT: WAIT! The dev build (running without logging) at the moment doesn't (!) seem to have the effect I had with my 3.5.4. At least the feed seems to have updated...I'll look into it.

martinrotter commented 7 years ago

OK, once you will be sure you have precise steps to reliably reproduce the bug, paste them here and I will try to exactly follow them.

danowar2k commented 7 years ago

I've reproduced the bug with my 3.5.4, my database. I attach the log. But 3.5.4 has not enough debug messages, so I can't see anything wrong there, except I'm wondering this: If the Spiegel feed does have 30 articles in it, why are there only ca. 21 thumbnail lines like this:

[rssguard] DEBUG: Adding enclosure 'http://www.spiegel.de/images/image-1212287-thumbsmall-lwpk.jpg' for the message. (17/10/11 14:00:42)

Attached: rssguard.log

Now, back to the dev build, using the stderr redirect method.

martinrotter commented 7 years ago

Maybe not all messages had enclosures.

EDIT: Yes, I double-checked, not all messages from that feed have attachments (enclosures).

danowar2k commented 7 years ago

Okay, I've changed for testing to this feed here: http://www.spiegel.de/schlagzeilen/index.rss This has ALL headlines from Spiegel.de , so it is updated more frequently. The bug still happened with this feed in 3.5.4, so I'm back at the dev build, trying to force it.

danowar2k commented 7 years ago

I will also test if I even need to go into Energy Save mode in 3.5.4 for the bug to manifest.

If the bug isn't reproducible in the dev build, then any code between 3.5.4 and the dev build has either had a fix for the bug or all those additional debug messages keep something alive that otherwise breaks.

danowar2k commented 7 years ago

Ok at least I can say that going into Energy Save mode isn't necessary to produce the bug in 3.5.4 . I just have to have the program running for a certain amount of time.

Now back to the dev build (even if I don't know what I should do anymore now) It seems at the moment that something here https://github.com/martinrotter/rssguard/compare/3.5.4...master has fixed the problem.

I'm not really fond of just accepting the bug is fixed (?) and move on because it could always come back later. So a suggestion: I send you my database and config and at first my full 3.5.4 RSSguard version. You check if you can reproduce the bug on your machine with that version (just by starting the program and checking up on it after half an hour or later).

If you can reproduce the bug, you use my database and a debug build of RSSguard 3.5.4 with a debugger and look inside the variables to see what is happening when the bug happens. At least I would do that with Java and PHP. I am sure there are good debugging tools for C++.

What do you say?

martinrotter commented 6 years ago

Shit, i missed this comment, SORRY. Let me respond now. :)

I did a couple of code changes regarding overall RSS Guard behavior which could make it better regarding this bug report and it is possible that the bug was really partially fixed.

Feel free to send any logs/databases I will check them and maybe it could lead somewhere.

So, I will not close this yet, but just asking: do you thing (try latest 3.5.5 version) the bug is actually fixed?

@danowar2k @gourdcaptain

danowar2k commented 6 years ago

I will try it out and report.

2017-11-30 10:17 GMT+01:00 martinrotter notifications@github.com:

Shit, i missed this comment, SORRY. Let me respond now. :)

I did a couple of code changes regarding overall RSS Guard behavior which could make it better regarding this bug report and it is possible that the bug was really partially fixed.

Feel free to send any logs/databases I will check them and maybe it could lead somewhere.

So, I will not close this yet, but just asking: do you thing (try latest 3.5.5 version) the bug is actually fixed?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/martinrotter/rssguard/issues/148#issuecomment-348128462, or mute the thread https://github.com/notifications/unsubscribe-auth/AKe3lbWRscfy_zvIbxtzAI-zzfFflG7Oks5s7nKigaJpZM4P2Dgo .

gourdcaptain commented 6 years ago

Seems to be working for me at the moment? No egregious cases of the bug have come up since 3.5.5

On November 30, 2017 4:17:22 AM EST, martinrotter notifications@github.com wrote:

Shit, i missed this comment, SORRY. Let me respond now. :)

I did a couple of code changes regarding overall RSS Guard behavior which could make it better regarding this bug report and it is possible that the bug was really partially fixed.

Feel free to send any logs/databases I will check them and maybe it could lead somewhere.

So, I will not close this yet, but just asking: do you thing (try latest 3.5.5 version) the bug is actually fixed?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_martinrotter_rssguard_issues_148-23issuecomment-2D348128462&d=DwICaQ&c=3buyMx9JlH1z22L_G5pM28wz_Ru6WjhVHwo-vpeS0Gk&r=0zQ8RN820LSg0alBA3zG9VWeCI6KR1FUJkKAfmvCcAs&m=0ZF29uH2pvnH4I1ahLtrsadGqZMTgsehEZZKUBjHJ08&s=MYeqR0TUnSKFFuq-LGYCcCHPTpoN-Is2x25DCcpWU4M&e=

martinrotter commented 4 years ago

Is this reliably working presently?

gourdcaptain commented 4 years ago

Yep. Although I've changed my setup significantly since I originally reported this - new desktop, went from btrfs on a hard drive as the filesystem the database was on to ext4 on a SSD, but it was working reliably before the switch I think.

martinrotter commented 4 years ago

I create merged ticket for similar problems which have probably the same root cause - head over there to discuss or inform me of development on your side.

https://github.com/martinrotter/rssguard/issues/237