lemon24 / reader

A Python feed reader library.
https://reader.readthedocs.io
BSD 3-Clause "New" or "Revised" License
434 stars 31 forks source link

Improve startup speed #297

Closed lemon24 closed 1 year ago

lemon24 commented 1 year ago

(Library only, CLI and app can be left alone for now.)

Some ways to get measurements (we might also want to include all stable plugins):

python -m cProfile -o program.prof <( echo 'import reader; reader.make_reader(":memory:")' )
python -s -X importtime -c 'import reader; reader.make_reader(":memory:")' 2> import.log

Both files can be visualized with tuna.

lemon24 commented 1 year ago

Top talkers for import time (default plugins only):

lemon24 commented 1 year ago

After making requests lazy:

lemon24 commented 1 year ago

Here's a summary of the improvements up to now, obtained with https://github.com/asottile/importtime-waterfall/pull/93:

pip install git+https://github.com/asottile/importtime-waterfall.git@0a11e9cdf9ea33f5008bbfd294f47e543c93794d
importtime-waterfall importreader --max-depth 8 --hide-under 10000 \
| grep -v -F -e requests. -e bs4.

importreader.py:

import reader
reader.make_reader(":memory:")

initial (006a8197c72cfad5c9cd36e213793a8508b37f04)

importreader (319624, 2175)
  reader (288755, 388)
    reader.core (288367, 1112)
      reader._parser (148664, 2367)
        reader._requests_utils (92812, 1687)
            requests (91106, 458)
              urllib3 (53114, 493)
        reader._types (33868, 4897)
          reader.types (28783, 11226)
            reader._utils (16901, 1244)
              multiprocessing.dummy (15657, 1391)
      reader._search (119990, 698)
        reader._html_utils (111905, 351)
          bs4 (111554, 591)
  reader._feedparser (24056, 352)
    reader._vendor.feedparser (23704, 200)
      reader._vendor.feedparser.api (23504, 597)

after bs4 (e90c8a322876a56d8cf376f64534d1d8886b8e1a)

importreader (215090, 2097)
  reader (172077, 391)
    reader.core (171686, 1089)
      reader._parser (142430, 2377)
        reader._requests_utils (89199, 1635)
            requests (87543, 449)
              urllib3 (50969, 476)
        reader._types (32109, 4982)
          reader.types (26940, 11219)
            reader._utils (15080, 862)
              multiprocessing.dummy (14218, 1268)
  reader._feedparser (36215, 350)
    reader._vendor.feedparser (35865, 199)
      reader._vendor.feedparser.api (35666, 577)

after requests (895df858a45386ceb5c49cd5ea22ef9c56adbe1f)

importreader (165551, 2158)
  reader (120585, 399)
    reader.core (120186, 1173)
      reader._parser (90280, 2598)
        reader._types (44579, 5830)
          reader.types (33746, 11378)
            reader._utils (21689, 798)
              multiprocessing.dummy (20891, 1429)
        reader._url_utils (22512, 280)
          urllib.request (22232, 1662)
            http.client (15397, 1160)
  reader._feedparser (36087, 364)
    reader._vendor.feedparser (35723, 200)
      reader._vendor.feedparser.api (35523, 586)
lemon24 commented 1 year ago

after feedparser (9852ba3)

importreader (128693, 2115)
  reader (119843, 401)
    reader.core (119442, 1088)
      reader._parser (90286, 2596)
        reader._types (44575, 6076)
          reader.types (33564, 11334)
            reader._utils (21520, 819)
              multiprocessing.dummy (20701, 1345)
        reader._url_utils (22633, 301)
          urllib.request (22332, 1673)
            http.client (15495, 1206)

after urllib.request (5522347)

importreader (105540, 2176)
  reader (96611, 405)
    reader.core (96206, 1161)
      reader._parser (67736, 2449)
        reader._types (44439, 5850)
          reader.types (33615, 11420)
            reader._utils (21500, 806)
              multiprocessing.dummy (20694, 1326)

after multiprocessing.dummy (9d01fa4)

importreader (85745, 2117)
  reader (76939, 400)
    reader.core (76539, 1099)
      reader._parser (47720, 2457)
        reader._types (24645, 4999)
          reader.types (14680, 11590)
lemon24 commented 1 year ago

Remaining work before release:

Possible cleanup/refactoring work:

lemon24 commented 1 year ago

On Linux.

importreader.py:

from reader import make_reader

# "maximal" reader
reader = make_reader(
    ':memory:', 
    feed_root='', 
    search_enabled=True,
)

reader.add_feed('file:feed.rss')  # but not http://
list(reader.get_entries())
list(reader.search_entries('entry'))
reader._parser.session_factory.response_hooks.append('unused')

Before:

importreader (255524, 4939)
  reader (229202, 218)
    reader.core (228984, 975)
      reader._parser (121043, 2109)
        reader._requests_utils (85179, 1647)
          requests.adapters (83532, 20)
            requests (83512, 362)
              urllib3 (43584, 330)
              requests.exceptions (23064, 692)
        reader._types (20245, 4085)
          reader.types (16022, 9280)
      reader._search (91539, 702)
        reader._html_utils (83311, 344)
          bs4 (82967, 447)
            bs4.builder (82520, 693)
              bs4.element (57275, 1117)
              bs4.builder._html5lib (13841, 420)
  reader._feedparser (17141, 320)
    reader._vendor.feedparser (16821, 146)
      reader._vendor.feedparser.api (16675, 534)

After:

importreader (71308, 4878)
  reader (60777, 207)
    reader.core (60570, 952)
      reader._parser (36389, 2091)
        reader._types (20010, 3970)
          reader.types (11312, 9302)