scrapinghub / scrapy-poet

Page Object pattern for Scrapy
BSD 3-Clause "New" or "Revised" License
119 stars 28 forks source link

Caching system #50

Closed kmike closed 2 years ago

kmike commented 3 years ago

This is for discussion.

It'd be nice to have a caching system for output of the providers; this might speed up re-running the spider after changes in page objects or callbacks.

It seems this is the place where cache could happen: https://github.com/scrapinghub/scrapy-poet/blob/f7ad036f62699513e10d5749910042a5268153fc/scrapy_poet/injection.py#L152

The main issue is how to compute the cache key, as kwargs might have different semantics for different providers. So, one option is to have some interface for providers which would allow them to tune how to do caching. Another option is to do nothing, and handle it on the provider level, or on lower levels.

kmike commented 3 years ago

Example of cache implemented in a provider: https://github.com/scrapinghub/scrapy-autoextract/pull/24

BurnzZ commented 2 years ago

The implementation of this feature is almost finished in https://github.com/scrapinghub/scrapy-poet/pull/55.

kmike commented 2 years ago

Fixed by https://github.com/scrapinghub/scrapy-poet/pull/55.