passiomatic / coldsweat

Web RSS aggregator and reader compatible with the Fever API
MIT License
146 stars 21 forks source link

Faster feed refresh? #35

Closed Gui13 closed 10 years ago

Gui13 commented 11 years ago

I have 60 feeds in my reading list (I know, my life is cluttered...), and the refresh takes quite a bit of time on a Raspberry Pi (ednit: was RasPi).

real 0m4.135s user 0m3.720s sys 0m0.260s

I'm wondering it the code could be somewhat made faster?

passiomatic commented 11 years ago

I have no idea of what a RasPi is, hence I don't know the machine specs so it's hard to figure out what is causing the slowdown.

As reference on my late 2007 MacBook I have a testbed of 68 feeds :

localhost - - [10/Nov/2013:14:44:23 +0100] 36351 INFO 68 feeds checked in 46.282415s
Gui13 commented 11 years ago

Yes, excuse my wording: RasPi is Raspberry Pi. Basically an ARM at 700MHz. Right now, it eats 100% of CPU for 8 minutes, each time, which is quite hard on even the console performances. I just checked, and the multiprocessing is enabled in the config. Also, my backend is SQLite, if that has influence on multi-process perfs.

I tried other RSS alternatives and generally encountered better performances, especially on the CPU load. I don't mind the process running for 8 minutes, but taking 100% CPU all the while is a bit too much.

I might have some non-responding feeds in the bunch, if that's of help (I could give you the OPML if you feel like investigating...).

passiomatic commented 11 years ago

Sure, copy-paste the OPML here and I'll give it a try.

Gui13 commented 10 years ago

OK, here you go:

<?xml version="1.0" encoding="utf-8"?>
<opml version="1.0">
  <head>
    <dateCreated>Sun, 10 Nov 2013 09:04:07 +0000</dateCreated>
    <title>Tiny Tiny RSS Feed Export</title>
  </head>
  <body>
    <outline text="Programmation">
      <outline text="C">
        <outline text="0xjfdube" xmlUrl="http://jfdube.wordpress.com/feed/" htmlUrl="http://jfdube.wordpress.com"/>
        <outline text="The Old New Thing" xmlUrl="http://blogs.msdn.com/b/oldnewthing/rss.aspx" htmlUrl="http://blogs.msdn.com/b/oldnewthing/"/>
        <outline text="Preshing on Programming" xmlUrl="http://preshing.com/feed" htmlUrl="http://preshing.com/"/>
        <outline text="Software Architecture" xmlUrl="http://tombarta.wordpress.com/feed/" htmlUrl="http://tombarta.wordpress.com"/>
        <outline text="Systemfault's Weblog" xmlUrl="http://debugfailure.wordpress.com/feed/" htmlUrl="http://debugfailure.wordpress.com"/>
      </outline>
      <outline text="Javascript &amp; Web">
        <outline text="Badass JavaScript" xmlUrl="http://rss.badassjs.com/" htmlUrl="http://badassjs.com/"/>
        <outline text="Codeflow" xmlUrl="http://codeflow.org/feed.rss" htmlUrl="http://codeflow.org/"/>
        <outline text="Le Petit Codeur" xmlUrl="http://petitcodeur.fr/feed.xml" htmlUrl="http://petitcodeur.fr/"/>
        <outline text="Breaking the Mobile Web" xmlUrl="http://www.mobilexweb.com/feed" htmlUrl="http://www.mobilexweb.com"/>
        <outline text="Eli Bendersky's website" xmlUrl="http://eli.thegreenplace.net/feed/" htmlUrl="http://eli.thegreenplace.net"/>
        <outline text="null program" xmlUrl="http://nullprogram.com/blog/index.rss" htmlUrl="http://nullprogram.com"/>
      </outline>
      <outline text="Python">
        <outline text="Jacob Kaplan-Moss - Writing" xmlUrl="http://jacobian.org/feed/" htmlUrl="http://jacobian.org/feed/"/>
        <outline text="PyPy Status Blog" xmlUrl="http://feeds.feedburner.com/PyPyStatusBlog" htmlUrl="http://feeds.feedburner.com/"/>
      </outline>
      <outline text="DamienG" xmlUrl="http://feed.damieng.com/DamienG" htmlUrl="http://damieng.com"/>
      <outline text="chris-granger.com" xmlUrl="http://feeds.feedburner.com/ChrisGranger" htmlUrl="http://chris-granger.com/"/>
      <outline text="ridiculous_fish" xmlUrl="http://ridiculousfish.com/blog/atom.xml" htmlUrl="http://ridiculousfish.com/blog/"/>
      <outline text="Ken Shirriff's blog" xmlUrl="http://www.arcfn.com/feeds/posts/default" htmlUrl="http://www.arcfn.com/feeds/posts/"/>
      <outline text="The Lonely Coder" xmlUrl="http://www.lonelycoder.com/blog/?feed=rss2" htmlUrl="http://www.lonelycoder.com/blog"/>
      <outline text="Literate Programming" xmlUrl="http://feeds.feedburner.com/triflingwhims" htmlUrl="http://feeds.feedburner.com/"/>
    </outline>
    <outline text="Divers">
      <outline text="Carnet maritime" xmlUrl="http://carnet-maritime.com/atom.xml" htmlUrl="http://carnet-maritime.com/"/>
      <outline text="Neolyse" xmlUrl="http://neolyse.info/blog/feeds/all.atom.xml" htmlUrl="http://neolyse.info/blog/"/>
      <outline text="Peter Watts" xmlUrl="http://www.rifters.com/crawl/?feed=rss2" htmlUrl="http://www.rifters.com/crawl"/>
      <outline text="L&quot;ouvreuse - Entrez dans le cinéma" xmlUrl="http://louvreuse.net/?format=feed" htmlUrl="http://louvreuse.net/component/content/?view=featured"/>
      <outline text="Uploads by SadaPlays" xmlUrl="http://gdata.youtube.com/feeds/base/users/SadaPlays/uploads?client=ytapi-youtube-rss-redirect&amp;orderby=updated&amp;alt=rss&amp;v=2" htmlUrl="http://www.youtube.com/channel/UCmkp0UUdTOqvS5355ql3NRg/videos"/>
    </outline>
    <outline text="Technologie">
      <outline text="BrainOverflow" xmlUrl="https://medium.com/feed/@FredericJacobs" htmlUrl="https://medium.com/@FredericJacobs"/>
      <outline text="Coding Horror" xmlUrl="http://feeds.feedburner.com/codinghorror" htmlUrl="http://www.codinghorror.com/blog/"/>
      <outline text="Da Scott Chacon Blog" xmlUrl="http://feeds.feedburner.com/ScottChacon" htmlUrl="http://scottchacon.com/"/>
      <outline text="FatBits: John Siracusa" xmlUrl="http://feeds.arstechnica.com/arstechnica/staff/fatbits?format=xml" htmlUrl="http://arstechnica.com"/>
      <outline text="Gustavo Duarte" xmlUrl="http://duartes.org/gustavo/blog/feed" htmlUrl="http://duartes.org/gustavo/blog"/>
      <outline text="Harder, Better, Faster, Stronger" xmlUrl="http://hbfs.wordpress.com/feed/" htmlUrl="http://hbfs.wordpress.com"/>
      <outline text="ici &amp; ailleurs" xmlUrl="http://neokraft.net/feed/atom" htmlUrl="http://neokraft.net/feed/"/>
      <outline text="JoPa.Fr" xmlUrl="http://www.jopa.fr/index.php/feed/" htmlUrl="http://www.jopa.fr"/>
      <outline text="Ivan Zuzak" xmlUrl="http://ivanzuzak.info/atom.xml" htmlUrl="http://ivanzuzak.info"/>
      <outline text="KeyJ's Blog" xmlUrl="http://keyj.s2000.ws/?feed=rss2" htmlUrl="http://keyj.emphy.de"/>
      <outline text="Korben" xmlUrl="http://feeds.feedburner.com/KorbensBlog-UpgradeYourMind" htmlUrl="http://korben.info"/>
      <outline text="Matthew Gregan" xmlUrl="http://blog.mjg.im/atom.xml" htmlUrl="http://blog.mjg.im/"/>
      <outline text="LinuxFr.org : les dépêches" xmlUrl="http://linuxfr.org/news.atom" htmlUrl="http://linuxfr.org/"/>
      <outline text="Ma petite parcelle d'Internet..." xmlUrl="http://sid.rstack.org/blog/rss.php" htmlUrl="http://sid.rstack.org/blog/index.php/"/>
      <outline text="Rands In Repose" xmlUrl="http://www.randsinrepose.com/index.xml" htmlUrl="http://randsinrepose.com"/>
      <outline text="NASA Image of the Day (Large)" xmlUrl="http://www.nasa.gov/rss/lg_image_of_the_day.rss" htmlUrl="http://www.nasa.gov/"/>
      <outline text="Russell Beattie" xmlUrl="http://feeds.russellbeattie.com/russellbeattieweblog" htmlUrl="http://feeds.russellbeattie.com/"/>
      <outline text="Terminally Incoherent" xmlUrl="http://feeds.feedburner.com/TerminallyIncoherent" htmlUrl="http://www.terminally-incoherent.com/blog"/>
      <outline text="Diary Of An x264 Developer" xmlUrl="http://x264dev.multimedia.cx/?feed=rss2" htmlUrl="http://x264dev.multimedia.cx"/>
      <outline text="ponnuki - electronic media art and yoga" xmlUrl="http://www.ponnuki.net/feed/atom/" htmlUrl="http://www.ponnuki.net/feed/atom/"/>
      <outline text="Vivek Haldar" xmlUrl="http://blog.vivekhaldar.com/rss" htmlUrl="http://blog.vivekhaldar.com/"/>
      <outline text="fabiensanglard.net" xmlUrl="http://fabiensanglard.net/rss.xml" htmlUrl="http://fabiensanglard.net"/>
      <outline text="Yield Thought" xmlUrl="http://yieldthought.com/rss" htmlUrl="http://yieldthought.com/"/>
      <outline text="Qt Blog" xmlUrl="http://blog.qt.digia.com/feed/" htmlUrl="http://blog.qt.digia.com"/>
      <outline text="Woboq" xmlUrl="http://feeds.woboq.com/woboq" htmlUrl="http://woboq.com"/>
      <outline text="MacBidouille.com" xmlUrl="http://feeds.macbidouille.com/macbidouille/" htmlUrl="http://www.macbidouille.com"/>
      <outline text="Standblog" xmlUrl="http://standblog.org/blog/feed/atom" htmlUrl="http://standblog.org/blog/feed/"/>
      <outline text="Blog de Stéphane Bortzmeyer" xmlUrl="http://www.bortzmeyer.org/feed-full.atom" htmlUrl="http://www.bortzmeyer.org/"/>
      <outline text="Tom Preston-Werner" xmlUrl="http://feeds.feedburner.com/tom-preston-werner" htmlUrl="http://tom.preston-werner.com/"/>
      <outline text="The Tao of Mac" xmlUrl="http://the.taoofmac.com/rss" htmlUrl="http://the.taoofmac.com"/>
      <outline text="yeKblog" xmlUrl="http://yeknan.free.fr/dc2/index.php?feed/atom" htmlUrl="http://yeknan.free.fr/dc2/"/>
    </outline>
    <outline text="Humour">
      <outline text="xkcd.com" xmlUrl="https://xkcd.com/atom.xml" htmlUrl="https://xkcd.com/"/>
      <outline text="Ceacy." xmlUrl="http://blogs.lasile.fr/ceacy/11.rss" htmlUrl="http://blogs.lasile.fr/ceacy"/>
      <outline text="Croustination" xmlUrl="http://feeds.feedburner.com/Croustination" htmlUrl="http://www.facebook.com/"/>
      <outline text="Macadam Valley" xmlUrl="http://macadamvalley.com/feed/" htmlUrl="http://macadamvalley.com"/>
    </outline>
    <outline text="Jeux Vidéos">
      <outline text="Borderline Lunatic" xmlUrl="http://blogs.wefrag.com/ubn22/feed/" htmlUrl="http://blogs.wefrag.com/ubn22"/>
      <outline text="Code Of Honor" xmlUrl="http://www.codeofhonor.com/blog/feed" htmlUrl="http://www.codeofhonor.com/blog"/>
      <outline text="Stevey's Blog Rants" xmlUrl="http://steve-yegge.blogspot.fr/atom.xml" htmlUrl="http://steve-yegge.blogspot.fr/"/>
      <outline text="Loser avec mention TB" xmlUrl="http://blogs.wefrag.com/drloser/feed/" htmlUrl="http://blogs.wefrag.com/drloser"/>
    </outline>
  </body>
</opml>

FYI the last refresh times are consistently above 660s (11 minutes), and fails quite often on existing feed (neolyse is down but the others are not):

localhost - - [26/Nov/2013:16:11:51 +0000] 14785 INFO 63 feeds checked in 706.783905s
localhost - - [26/Nov/2013:17:03:52 +0000] 14898 WARNING neolyse.info replied with status 404, aborted
localhost - - [26/Nov/2013:17:04:36 +0000] 14898 INFO keyj.s2000.ws caused a parser error (<unknown>:8:363: not well-formed (invalid token)), tried to parse it anyway
localhost - - [26/Nov/2013:17:11:06 +0000] 14893 INFO 63 feeds checked in 660.942276s
localhost - - [26/Nov/2013:18:00:50 +0000] 14994 WARNING a network error occured while fetching eli.thegreenplace.net, skipped
localhost - - [26/Nov/2013:18:04:10 +0000] 14994 WARNING neolyse.info replied with status 404, aborted
localhost - - [26/Nov/2013:18:11:20 +0000] 14989 INFO 63 feeds checked in 674.963409s
localhost - - [26/Nov/2013:19:03:54 +0000] 15122 WARNING neolyse.info replied with status 404, aborted
localhost - - [26/Nov/2013:19:11:07 +0000] 15117 INFO 63 feeds checked in 661.704504s
localhost - - [26/Nov/2013:20:04:04 +0000] 15252 WARNING neolyse.info replied with status 404, aborted
Gui13 commented 10 years ago

Here's a gist of a run under cProfile on the Pi:

https://gist.github.com/Gui13/7667132

I ran it like this: python -m cProfile -s time sweat.py refresh

passiomatic commented 10 years ago

Cool, I didn't know about cProfile . I run the refresh locally using sqlite and settings below. I tried with multiprocessing off and on. Your feeds looks OK here too - they don't stall the fetcher (no timeouts) so everything work as expected.

As expected results on my MacBook are pretty diffent from yours. If I read your profiling repost correctly It seems that with the RasPi fetcher spends an inordinate amount in the thread.lock object - this must have something to do with multiprocessing module since it is the only place where threads are used (apart from web sessions, but they are inactive here).

Quite ironically in your case turning off multiprocessing should help to make the fetcher faster.

I'll paste my results below for further reference.


Without multiprocessing

localhost - - [27/Nov/2013:10:39:41 +0100] 46235 WARNING neolyse.info replied with status 404, aborted
localhost - - [27/Nov/2013:10:40:33 +0100] 46235 INFO 63 feeds checked in 94.616577s

scrub: off  
multiprocessing: off  

         41723755 function calls (41435442 primitive calls) in 95.211 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    19681   28.384    0.001   28.384    0.001 {method 'recv' of '_socket.socket' objects}
  6478465    6.421    0.000    8.586    0.000 compat.py:30(wrap_ord)
       30    5.979    0.199   15.111    0.504 sbcharsetprober.py:70(feed)
       64    5.352    0.084    5.352    0.084 <string>:1(connect)
       64    3.205    0.050    3.205    0.050 {_socket.getaddrinfo}
  9928605    2.712    0.000    2.720    0.000 {isinstance}
   139238    2.695    0.000    2.801    0.000 {built-in method sub}
   750844    2.540    0.000    4.545    0.000 BeautifulSoup.py:977(_matches)
  2161790    2.460    0.000    3.307    0.000 {hasattr}
    51955    2.243    0.000    2.243    0.000 {method 'translate' of 'unicode' objects}
   107996    1.761    0.000   10.522    0.000 sgmllib.py:238(parse_starttag)
   670482    1.529    0.000    2.495    0.000 codingstatemachine.py:42(next_state)
38845/2865    1.488    0.000   13.409    0.005 BeautifulSoup.py:348(_findAll)
232091/179900    1.402    0.000    7.431    0.000 BeautifulSoup.py:913(searchTag)
566754/419725    1.392    0.000   12.180    0.000 BeautifulSoup.py:950(search)
     3241    1.390    0.000   16.467    0.005 sgmllib.py:116(goahead)
        2    1.125    0.562    3.799    1.899 utf8prober.py:50(feed)
   609271    1.018    0.000    1.018    0.000 {built-in method match}
   107996    0.985    0.000    7.367    0.000 sgmllib.py:331(finish_starttag)
  6481470    0.887    0.000    0.887    0.000 {ord}
   363579    0.847    0.000    0.847    0.000 BeautifulSoup.py:466(__getattr__)
   289315    0.818    0.000    1.059    0.000 feedparser.py:330(__getitem__)
   500809    0.786    0.000    5.240    0.000 {getattr}
   357295    0.770    0.000    0.770    0.000 {built-in method search}
    63300    0.708    0.000    2.134    0.000 BeautifulSoup.py:1239(endData)
    71580    0.542    0.000    1.027    0.000 feedparser.py:1937(unknown_starttag)
    35980    0.536    0.000    3.583    0.000 BeautifulSoup.py:1330(unknown_starttag)
        2    0.520    0.260    0.520    0.260 {built-in method do_handshake}
   256238    0.507    0.000    1.682    0.000 feedparser.py:761(handle_data)
    80059    0.476    0.000    2.992    0.000 sgmllib.py:311(parse_endtag)
    80059    0.453    0.000    2.348    0.000 sgmllib.py:349(finish_endtag)
   605599    0.416    0.000    0.673    0.000 BeautifulSoup.py:878(recursiveChildGenerator)
    37126    0.385    0.000    1.065    0.000 BeautifulSoup.py:535(__init__)
    85091    0.320    0.000    0.347    0.000 BeautifulSoup.py:132(setup)
     5280    0.284    0.000   33.137    0.006 feedparser.py:840(pop)
   188607    0.278    0.000    4.542    0.000 BeautifulSoup.py:864(_getAttrMap)
   274600    0.276    0.000    1.268    0.000 feedparser.py:392(get)
      838    0.267    0.000    0.267    0.000 {built-in method read}
   692590    0.258    0.000    0.258    0.000 {method 'get' of 'dict' objects}
   188607    0.258    0.000    4.864    0.000 BeautifulSoup.py:590(get)
    36008    0.258    0.000    1.272    0.000 feedparser.py:2726(unknown_starttag)
      115    0.257    0.002   36.390    0.316 {built-in method Parse}
    28338    0.256    0.000    0.398    0.000 BeautifulSoup.py:1284(_smartPop)
    34526    0.251    0.000    0.438    0.000 BeautifulSoup.py:1262(_popToTag)
    71580    0.243    0.000    0.291    0.000 feedparser.py:1928(normalize_attrs)
       63    0.243    0.004    0.243    0.004 {method 'commit' of 'sqlite3.Connection' objects}
   256238    0.235    0.000    1.917    0.000 feedparser.py:1835(characters)
   905590    0.231    0.000    0.231    0.000 {method 'append' of 'list' objects}
    98646    0.230    0.000    0.319    0.000 BeautifulSoup.py:1195(__getattr__)
   109086    0.206    0.000    0.266    0.000 BeautifulSoup.py:1208(isSelfClosingTag)
   721957    0.202    0.000    0.202    0.000 BeautifulSoup.py:626(__nonzero__)
   256238    0.193    0.000    2.110    0.000 pyexpat.c:479(CharacterData)
    37126    0.192    0.000    0.223    0.000 BeautifulSoup.py:1232(pushTag)
    42938    0.191    0.000    0.462    0.000 {map}
    36008    0.181    0.000    0.847    0.000 feedparser.py:2559(unknown_starttag)
   667762    0.179    0.000    0.179    0.000 codingstatemachine.py:57(get_current_charlen)
    72016    0.165    0.000    0.447    0.000 feedparser.py:250(search)
    72016    0.165    0.000    5.276    0.000 feedparser.py:1905(parse_starttag)
   983124    0.155    0.000    0.155    0.000 {callable}
    35980    0.147    0.000    4.122    0.000 BeautifulSoup.py:661(__getattr__)
   327920    0.145    0.000    0.145    0.000 {built-in method start}
    55118    0.143    0.000    0.208    0.000 feedparser.py:2810(handle_data)
   283462    0.139    0.000    0.139    0.000 {method 'startswith' of 'str' objects}
    45253    0.137    0.000    0.137    0.000 {_codecs.utf_8_decode}
    35980    0.136    0.000    0.151    0.000 BeautifulSoup.py:1224(popTag)
38845/2865    0.129    0.000   13.422    0.005 BeautifulSoup.py:835(findAll)
    53324    0.120    0.000    0.133    0.000 feedparser.py:1966(unknown_endtag)
    26686    0.117    0.000    1.419    0.000 BeautifulSoup.py:1360(unknown_endtag)
   265187    0.113    0.000    0.113    0.000 {built-in method end}
   232493    0.109    0.000    0.109    0.000 {method 'items' of 'dict' objects}
32741/32570    0.103    0.000    0.124    0.000 {method 'encode' of 'unicode' objects}
   110151    0.101    0.000    0.125    0.000 feedparser.py:1994(handle_data)
    45240    0.096    0.000    0.268    0.000 {method 'decode' of 'str' objects}
     9330    0.088    0.000    0.223    0.000 urlparse.py:133(urlsplit)
    62820    0.088    0.000    0.113    0.000 BeautifulSoup.py:1373(handle_data)
     5311    0.087    0.000    0.349    0.000 feedparser.py:595(unknown_starttag)
   100490    0.086    0.000    0.086    0.000 {range}
   124953    0.081    0.000    0.081    0.000 {method 'replace' of 'str' objects}
    26686    0.080    0.000    0.147    0.000 feedparser.py:2789(unknown_endtag)
    41653    0.079    0.000    0.345    0.000 re.py:227(_compile)
   277184    0.079    0.000    0.079    0.000 {method 'lower' of 'str' objects}
     1334    0.077    0.000    0.092    0.000 feedparser.py:2041(output)
    47965    0.077    0.000    0.161    0.000 BeautifulSoup.py:451(__new__)
    65428    0.076    0.000    0.076    0.000 {built-in method __new__ of type object at 0x100126080}
    35980    0.076    0.000    3.920    0.000 BeautifulSoup.py:824(find)
     5311    0.073    0.000    0.459    0.000 feedparser.py:1788(startElementNS)
   225242    0.072    0.000    0.072    0.000 {method 'has_key' of 'dict' objects}
302785/301614    0.071    0.000    0.071    0.000 {len}
   124588    0.070    0.000    0.070    0.000 {built-in method group}
    72016    0.066    0.000    0.103    0.000 feedparser.py:260(start)
    38845    0.066    0.000    0.066    0.000 BeautifulSoup.py:1012(__init__)
        2    0.065    0.033    0.133    0.066 latin1prober.py:110(feed)
    30359    0.059    0.000    0.247    0.000 BeautifulSoup.py:197(_lastRecursiveChild)
    81625    0.057    0.000    0.057    0.000 {method 'join' of 'unicode' objects}
    38845    0.056    0.000    0.085    0.000 BeautifulSoup.py:893(__init__)
     2657    0.055    0.000    0.214    0.000 feedparser.py:2814(sanitize_style)
    96289    0.048    0.000    0.048    0.000 {method 'lower' of 'unicode' objects}
    45253    0.046    0.000    0.183    0.000 utf_8.py:15(decode)
       64    0.046    0.001    0.047    0.001 {method 'execute' of 'sqlite3.Cursor' objects}
10618/1151    0.046    0.000    0.191    0.000 copy.py:144(deepcopy)
     5311    0.046    0.000    0.522    0.000 expatreader.py:306(start_element_ns)
    25190    0.045    0.000    2.054    0.000 re.py:144(sub)
      745    0.045    0.000   17.462    0.023 socket.py:373(readline)
     7487    0.044    0.000    0.062    0.000 urlparse.py:125(_splitnetloc)
      130    0.042    0.000    0.042    0.000 {built-in method decompress}
     5311    0.041    0.000   33.404    0.006 feedparser.py:683(unknown_endtag)
    14767    0.040    0.000    0.054    0.000 feedparser.py:398(__setitem__)
    72016    0.038    0.000    0.038    0.000 feedparser.py:258(__init__)
 1198/257    0.038    0.000    0.104    0.000 sre_parse.py:385(_parse)
        2    0.038    0.019    0.038    0.019 {_ssl.sslwrap}
     9266    0.038    0.000    0.276    0.000 urlparse.py:102(urlparse)
     5311    0.036    0.000   33.451    0.006 feedparser.py:1838(endElementNS)
     7806    0.035    0.000    0.044    0.000 feedparser.py:1972(handle_charref)
    20462    0.035    0.000    0.216    0.000 BeautifulSoup.py:558(<lambda>)
      517    0.035    0.000   11.315    0.022 socket.py:313(read)
    68465    0.034    0.000    0.034    0.000 {method 'find' of 'str' objects}
    32805    0.031    0.000    0.031    0.000 {method 'startswith' of 'unicode' objects}
    16386    0.030    0.000    0.030    0.000 {method 'split' of 'unicode' objects}
     4190    0.029    0.000    0.156    0.000 feedparser.py:2451(isProbablyDownloadable)
    36532    0.026    0.000    0.026    0.000 {method 'rfind' of 'str' objects}
      910    0.026    0.000    0.058    0.000 sre_compile.py:213(_optimize_charset)
7239/6646    0.026    0.000   11.620    0.002 {method 'join' of 'str' objects}
 2235/237    0.025    0.000    0.094    0.000 sre_compile.py:38(_compile)
     1334    0.024    0.000    8.359    0.006 feedparser.py:1912(feed)
     6865    0.024    0.000    0.138    0.000 feedparser.py:452(_urljoin)
     1151    0.023    0.000    0.043    0.000 {method '__reduce_ex__' of 'object' objects}
      191    0.023    0.000    0.023    0.000 {open}
    27923    0.023    0.000    0.023    0.000 {method 'sort' of 'list' objects}
     5311    0.022    0.000   33.483    0.006 expatreader.py:340(end_element_ns)
    52315    0.022    0.000    0.022    0.000 {method 'pop' of 'list' objects}
    13171    0.022    0.000    0.026    0.000 sre_parse.py:188(__next)
       66    0.021    0.000    0.028    0.000 {__import__}
     3793    0.021    0.000    0.027    0.000 BeautifulSoup.py:1403(handle_entityref)
    13189    0.021    0.000    0.024    0.000 feedparser.py:1885(_shorttag_replace)
    71960    0.020    0.000    0.020    0.000 feedparser.py:2137(<lambda>)
     1151    0.020    0.000    0.114    0.000 copy.py:300(_reconstruct)
     1172    0.020    0.000   32.805    0.028 feedparser.py:1020(popContent)
     2354    0.019    0.000    0.019    0.000 {method 'write' of 'cStringIO.StringO' objects}
        1    0.019    0.019    0.117    0.117 feedparser.py:10(<module>)
     8520    0.018    0.000    0.034    0.000 re.py:269(_subx)
    68377    0.018    0.000    0.018    0.000 {method 'strip' of 'str' objects}
    10519    0.018    0.000    0.024    0.000 copy.py:261(_keep_alive)
      844    0.017    0.000    0.018    0.000 BeautifulSoup.py:1842(_toUnicode)
        6    0.017    0.003    0.023    0.004 sre_compile.py:307(_optimize_unicode)
     3637    0.017    0.000    0.034    0.000 feedparser.py:410(__getattr__)
     9735    0.017    0.000    0.134    0.000 feedparser.py:2572(_makeSafeAbsoluteURI)
      573    0.016    0.000    2.017    0.004 feedparser.py:2490(findEnclosures)
     2701    0.016    0.000    0.016    0.000 {built-in method findall}
     4113    0.016    0.000    0.062    0.000 feedparser.py:404(setdefault)
     3850    0.015    0.000    0.021    0.000 BeautifulSoup.py:1395(handle_charref)
     4949    0.015    0.000    0.039    0.000 feedparser.py:377(__contains__)
3104/1146    0.015    0.000    0.017    0.000 sre_parse.py:146(getwidth)
        1    0.015    0.015   94.617   94.617 fetcher.py:365(fetch_feeds)
     5600    0.014    0.000    0.019    0.000 feedparser.py:799(mapContentType)
      573    0.014    0.000    8.871    0.015 BeautifulSoup.py:1162(_feed)
     7592    0.014    0.000    0.016    0.000 feedparser.py:1986(handle_entityref)
     2480    0.014    0.000    0.016    0.000 sgmllib.py:71(reset)
      372    0.013    0.000   11.590    0.031 response.py:129(read)
     1146    0.013    0.000    0.051    0.000 BeautifulSoup.py:1214(reset)
    19945    0.012    0.000    0.012    0.000 {method 'find' of 'unicode' objects}
      573    0.012    0.000    8.917    0.016 BeautifulSoup.py:1083(__init__)
       63    0.012    0.000   94.573    1.501 fetcher.py:207(fetch_feed)
     4712    0.012    0.000    0.023    0.000 sgmllib.py:300(_convert_ref)
     1582    0.012    0.000    0.012    0.000 {method 'replace' of 'unicode' objects}
        1    0.012    0.012   95.230   95.230 sweat.py:5(<module>)
        ...

With multiprocessing

localhost - - [27/Nov/2013:10:53:12 +0100] 46353 WARNING neolyse.info replied with status 404, aborted
localhost - - [27/Nov/2013:10:53:45 +0100] 46351 INFO 63 feeds checked in 50.895383s

scrub: off  
multiprocessing: on  

         236114 function calls (229597 primitive calls) in 53.379 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       40   50.708    1.268   50.708    1.268 {built-in method acquire}
      537    0.705    0.001    0.705    0.001 {posix.stat}
       96    0.092    0.001    0.092    0.001 {open}
        1    0.089    0.089    0.247    0.247 feedparser.py:10(<module>)
       93    0.088    0.001    0.088    0.001 {method 'read' of 'file' objects}
        1    0.067    0.067    1.682    1.682 fetcher.py:8(<module>)
        2    0.066    0.033    0.426    0.213 __init__.py:8(<module>)
        1    0.058    0.058    1.994    1.994 commands.py:4(<module>)
        1    0.049    0.049    0.099    0.099 request.py:1(<module>)
        1    0.043    0.043    0.901    0.901 _mysql.py:1(__bootstrap__)
        1    0.038    0.038    0.054    0.054 opml.py:8(<module>)
        1    0.038    0.038    0.038    0.038 expat.py:1(<module>)
        2    0.037    0.018    1.016    0.508 models.py:8(<module>)
        1    0.036    0.036    0.123    0.123 utils.py:10(<module>)
 1065/225    0.036    0.000    0.097    0.000 sre_parse.py:385(_parse)
        1    0.035    0.035   53.379   53.379 sweat.py:5(<module>)  
        ....
Gui13 commented 10 years ago

I'm running a refresh without multiprocessing right now with the profiling, I'll post the results when done. Strangely, the refresh time went down to ~450s for 5 times in a row since yesterday.

passiomatic commented 10 years ago

Also, I'm noticing another thing looking at my report. In the first report BeautifulSoup is used instead of regular XML processing library. I believe BeautifulSoup kicks in if Feedparser encounters ill-formed XML . You should see lines like "XXX caused a parser error (XXX), tried to parse it anyway" in your log file (at DEBUG level). Looking at the profile times/number of calls BeautifulSoup is quite slow (version 3.x at least). So if you have few not well-formed feeds it could cause a severe slow-down and 100% cpu usage.

Gui13 commented 10 years ago

Yep I got this for 2 feeds. Here's the profile for non-multiprocess refresh: https://gist.github.com/7675786

You'll notice Sqlite seems quite slow, which is probably because the SD card is a bit slow on the machine (embedded ARM).

passiomatic commented 10 years ago

You'll notice Sqlite seems quite slow, which is probably because the SD card is a bit slow on the machine (embedded ARM).

I think there's little I can do to fix things on the performance side when dealing with slower hardware. However, the original Rui Carmo's Bottle Fever has the ability to switch the feed parser and use speedparser instead, but then in the fetcher code system must deal with semantic differences. In other words code gets more complicated because speedparser is hardly a drop-in replacement for the Universal Feed Parser.

If one would fork Coldsweat and manage to use speedpaser instead of the default parser it could be interesting to see the performance differences.