skeeto / elfeed

An Emacs web feeds client
The Unlicense
1.48k stars 114 forks source link

Elfeed is too slow when when there are a lot of rss list #317

Open testinggithub1222 opened 5 years ago

testinggithub1222 commented 5 years ago

I have been using elfeed for a while now and build up like 30 to 50 rss list. Whenever I call elfeed-update to update everything, it takes quite a long time and hang emacs during the time. It might be 20mn, i am not sure. It consumes 100% of cpu and 54% of my ram which should be 4gb of 8gb of ram and garbage collection always collect. So, my question is, is there a way to make this run faster without interruption or hang emacs? Is that parsing xml consume that much cpu and took so long period of time? Isn't there async process?

skeeto commented 5 years ago

Fetching 30 to 50 feeds should only disrupt Emacs for a couple of seconds, so this isn't normal.

The very first thing to check is that Elfeed is using curl to fetch feeds. The variable elfeed-use-curl indicates whether curl will be used or not, and it will automatically set to t if curl was found when Elfeed was loaded. If it's nil, you should install curl and ensure it's on your PATH. This has two major benefits:

Also make sure Elfeed has been byte-compiled. This should happen automatically if you installed it via package.el.

Since you're seeing such excessive memory use, it sounds like one of the feeds you're fetching may be humongous. A casual experiment suggests that, on 64-bit computers, the s-expression representation of an RSS feed is about four times larger than the XML content. This is due to all the pointer overhead (lots of cons cells for each element). But if you're seeing 4GB of memory use, then that puts the feed at about 800MB (buffer + s-exp = 4GB), which seems unlikely. Even more so for the 8GB case.

Take a look at each of the feeds in your list and see if any of them are particularly large. Also have Elfeed fetch them one at a time to see which one causes the problem. If you can narrow it down to a particular feed, I'd like to know about it.

testinggithub1222 commented 5 years ago

@skeeto, I have check elfeed-use-curl variable and it return true and it's already byte-compile as i seen .elc file there in my elfeed package dir.

Last night i have tried it again on my another more power computer and give gc-cons-threshold 8gb of ram as i got 16gb there and yes it took all the 8gb with 100% cpu again.

Also have Elfeed fetch them one at a time to see which one causes the problem

I will tried them. I doubt if reddit feed is the cause as i got many of them.

testinggithub1222 commented 5 years ago

@skeeto, I have set my feed list to empty as below:

(setq elfeed-feeds
        '())

and then tried to call elfeed-update to see the cpu time but it still took 89% of cpu again. I doubt this might be because i have too many feeds (when calling elfeed). I got 6004 list of feed there listing from 2019-05-02 to 2018-12-16

skeeto commented 5 years ago

When you set elfeed-feeds to an empty list, did you evaluate that expression before using elfeed-search-fetch (G) or elfeed-update? That should be a no-op when elfeed-feeds is empty.

The other thing to check is your search filter. The search buffer listing is updated in full every time a feed completes. If your filter is blank so that every entry in the database is listed, that will waste a lot of time recomputing the listing again and again as feeds complete. Make sure you have a time cutoff (e.g. @1-week-ago), and sooner is better.

testinggithub1222 commented 5 years ago

I have tried delete ~/.elfeed folder in the purpose of testing this. After deleted this folder, elfeed is empty so then i tried to call elfeed-update to update my feed. But before that, I have already set filtering only show feed in the last 3 months. So, during the elfeed-update at the first time, everything seem smooth but after 1000 record was fetched, thing start to slow down. It took a while until reach 2000 record and it stuck there for so long but no more record is added beyond this. I doubt if it still fetch more data but only show the last 3 months of record.

If you don't mind, here is my configuration:

(use-package elfeed
  :ensure t
  :config
  (setq 
elfeed-search-filter "@3-months-ago "
elfeed-feeds
'("https://www.reddit.com/r/programming/.rss" "https://www.xda-developers.com/feed/" "https://www.reddit.com/r/science/.rss" "https://www.reddit.com/r/kurzgesagt/.rss" "https://www.reddit.com/r/worldnews/.rss" "https://www.reddit.com/r/todayilearned/.rss" "https://www.reddit.com/r/LifeProTips/.rss" "https://www.reddit.com/r/explainlikeimfive/.rss" "https://www.reddit.com/r/DIY/.rss" "https://www.reddit.com/r/technology/.rss" "https://www.reddit.com/r/Android/.rss" "https://www.reddit.com/r/linux/.rss" "https://www.reddit.com/r/archlinux/.rss" "https://www.reddit.com/r/gadgets/.rss" "https://www.reddit.com/r/IAmA/.rss" "https://www.reddit.com/r/Futurology/.rss" "https://www.reddit.com/r/AskMen/.rss" "https://www.reddit.com/r/AskWomen/.rss" "https://www.reddit.com/r/vim/.rss" "https://www.reddit.com/r/YouShouldKnow/.rss" "https://www.reddit.com/r/learnprogramming/.rss" "https://www.reddit.com/r/investing/.rss" "https://www.reddit.com/r/hacking/.rss" "https://www.reddit.com/r/javascript/.rss" "https://www.reddit.com/r/java/.rss" "https://www.reddit.com/r/Piracy/.rss" "https://www.reddit.com/r/homeautomation/.rss" "https://www.reddit.com/r/quotes/.rss" "https://www.reddit.com/r/androidapps/.rss" "https://www.reddit.com/r/HowToHack/.rss" "https://www.reddit.com/r/arduino/.rss" "https://www.reddit.com/r/startup/.rss" "https://www.reddit.com/r/Entrepreneur/.rss" "https://lifehacker.com/tag/linux/rss" )))
skeeto commented 5 years ago

Everything seems to work fine when I use this configuration directly, but I can see how it would slow down over time. Several of these feeds are very, very busy with several new entries per minute. After a few days of regular pulling, you're going to end up with tons of entries.

Elfeed can handle this, but only if you don't always ask to see so much of it all at once. Your broad default filter is constantly refilling and redrawing the search listing, which puts a huge load on Emacs. That's the real issue here. Change the default filter to something like "@3-days-ago +unread" to significantly constrain what's being shown.

FYI, I tested your configuration by putting it (minus the use-package part), in a file named tmp.el in the repository, then:

$ make clean
$ HOME=. make virtual ARGS='-l tmp.el'

That provides a clean, empty, isolated, temporary test environment. The special "virtual" target in the Makefile imports a copy of your real database by default, so the HOME=. part stops this from happening.

testinggithub1222 commented 5 years ago

Sorry for late reply i was so busy in this month. By the way, I am not sure what or how could i do your given idea. I have read it many time but still not sure about it. Should i download this Makefile from elfeed repo and then execute your given script:

$ make clean $ HOME=. make virtual ARGS='-l tmp.el'

And what does it mean by? How it is really work?

That provides a clean, empty, isolated, temporary test environment. The special "virtual" target in the Makefile imports a copy of your real database by default.

And especially, this last word, how does it working and what should i do with it?:

so the HOME=. part stops this from happening

skeeto commented 5 years ago

I was just describing how I was testing in isolation so you, or anyone else following along, could reproduce it my tests if needed. You can just ignore that if it doesn't make sense.

testinggithub1222 commented 5 years ago

I have tried to reduce to "@15-days-ago +unread" and feed show around a thousand. Around this number, it seem works good.

mssdvd commented 2 years ago

@testinggithub1222 are you using Flycheck? If so, #448 could fix your problem.