Open suspectinside opened 2 years ago
Hi @suspectinside , I'm not able to reproduce this locally, as the following minimal code derived from your example runs okay on my end.
I suspect that there's something else outside of your code example that causes this issue. Unfortunately, the logs you've noted doesn't exactly pinpoint the problem.
Could you try out copying the code below to 3 different modules in your project to see if it works?
# providers.py
import logging
from typing import Set
from collections.abc import Callable
from scrapy_poet.page_input_providers import PageObjectInputProvider
logger = logging.getLogger()
class Arq:
async def enqueue_task(self, task: dict):
logger.info('Arq.enqueue_task() enqueueing new task: %r', task)
class ArqProvider(PageObjectInputProvider):
provided_classes = {Arq}
name = 'ARQ_PROVIDER'
async def __call__(self, to_provide: Set[Callable]):
return [Arq()]
# pageobjects.py
import attr
from web_poet.pages import Injectable, WebPage, ItemWebPage
from .providers import Arq
@attr.define
class IndexPage(WebPage):
arq: Arq
async def page_titles(self):
await self.arq.enqueue_task({'bla': 'bla!'})
return [
(el.attrib['href'], el.css('::text').get())
for el in self.css('.selected a.reference.external')
]
# spiders/title_spider.py
import scrapy
from ..pageobjects import IndexPage
from ..providers import ArqProvider
class TitlesLocalSpider(scrapy.Spider):
name = 'titles.local'
start_urls = ["https://books.toscrape.com"]
custom_settings = {
"SCRAPY_POET_PROVIDERS": {
ArqProvider: 600, # MY PROVIDER FOR INJECTABLE arq: Arq
},
"DOWNLOADER_MIDDLEWARES": {
"scrapy_poet.InjectionMiddleware": 543,
},
}
async def parse(self, response, index_page: IndexPage):
self.logger.info(await index_page.page_titles)
# ... omitted log lines
2022-09-05 11:57:31 [scrapy.core.engine] INFO: Spider opened
2022-09-05 11:57:31 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-09-05 11:57:31 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2022-09-05 11:57:34 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://books.toscrape.com/robots.txt> (referer: None)
2022-09-05 11:57:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com> (referer: None)
2022-09-05 11:57:35 [root] INFO: Arq.enqueue_task() enqueueing new task: {'bla': 'bla!'}
2022-09-05 11:57:35 [titles.local] INFO: []
2022-09-05 11:57:35 [scrapy.core.engine] INFO: Closing spider (finished)
# ... omitted log lines
Could **kwargs
in parse
be the cause?
I've tried adding the **kwargs
but it wasn't enough to cause the same issue.
Yep! Thanks a lot, I could find the source of the problem - it happens if i use new builtins.set (with generics support) instead of depricated (since 3.9) typing.Set
so, if i change __call__
's decl from this one:
async def __call__(self, to_provide: set[Callable], settings: Settings) -> Sequence[Callable]:
into smth like this:
from typing import Set
# ...
async def __call__(self, to_provide: Set[Callable], settings: Settings) -> Sequence[Callable]:
everything works correctly.
by the way, collections.abc.Set
doesn't work too, from the other hand the Python team has depricated all that typing.{Set, Dict, List etc}
guys due to builtins or collections.abc.*
support instead, and may be it would be correct to add them into IoC engine too?
in any case, Scrapy-poet(Web-poet) is one of the best approaches i've ever seen and combinations of IoC and Page Object Model pattern for scrapping really shines! thanks a lot for it ;)
...and just another one quick question: what's the best (more correct) way to provide Singleton object instance using scrapy-poet IoC infrastructure ?
let's say that abovementioned Arq
should be a singleton service provider, what is the best way to return it from __call__
method in this case (may i configure IoC cntr somewhere or smth like that?)
I see, great catch! I believe we can use the typing
module as a short-term workaround since PEP 585 mentions:
The deprecated functionality will be removed from the typing module in the first Python version released 5 years after the release of Python 3.9.0.
I'm not quite sure how large of an undertaking would it be to completely move to the builtins
since web-poet
and scrapy-poet
still supports 3.7 and 3.8. I'm guessing that if we drop support for them when they lose Python support, the switch would be much easier.
in any case, Scrapy-poet(Web-poet) is one of the best approaches i've ever seen and combinations of IoC and Page Object Model pattern for scrapping really shines! thanks a lot for it ;)
💖 That'd be @kmike's work for you :)
what's the best (more correct) way to provide Singleton object instance using scrapy-poet IoC infrastructure ?
Lot's of approaches on this one but I think the most convenient one is to assign it as a class variable in the provider itself. Technically, it's not a true singleton in this case since the Arq
could still be instantiated outside of the provider. However, that should still be okay since the the provider would ensure that the Arq
its providing would be a singleton for every __call__()
method call.
Hi, just sample setup:
Injectable entity - arq: Arq. So, i'd like to work with arq instance here.
and i got the error like this:
So, could you pls explain why this error happens and how to fix it?