scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
50.89k stars 10.33k forks source link

Integration with DI frameworks #5938

Closed stevefan1999-personal closed 10 months ago

stevefan1999-personal commented 11 months ago

Summary

Let us use DI framework in the pipelines and spiders. My personal favourite: https://python-dependency-injector.ets-labs.org/introduction/di_in_python.html

Motivation

We have a lot of common code we want to reuse in our scraping framework. Using DI to keep them tidy is nice to have.

Describe alternatives you've considered

Duplicate all the code and constructions. But only idiots likes duplicated codes or make global variables.

wRAR commented 11 months ago

Can you provide a more detailed example of how would you use a DI framework in your "pipelines and spiders" and what kind of common code you would want to inject and in which parts of the spider code?

stevefan1999-personal commented 11 months ago

Dependency Injector — Dependency injection framework for Python — Dependency Injector 4.41.0 documentation (ets-labs.org)

I took the code from above link as an example

from dependency_injector import containers, providers
from dependency_injector.wiring import Provide, inject

class Container(containers.DeclarativeContainer):

    config = providers.Configuration()

    api_client = providers.Singleton(
        ApiClient,
        api_key=config.api_key,
        timeout=config.timeout,
    )

    service = providers.Factory(
        Service,
        api_client=api_client,
    )

@inject
def main(service: Service = Provide[Container.service]) -> None:
    ...

if __name__ == "__main__":
    container = Container()
    container.config.api_key.from_env("API_KEY", required=True)
    container.config.timeout.from_env("TIMEOUT", as_=int, default=5)
    container.wire(modules=[__name__])

    main()  # <-- dependency is injected automatically

    with container.api_client.override(mock.Mock()):
        main()  # <-- overridden dependency is injected automatically

I will make an example for scrapy later

wRAR commented 10 months ago

Not sure if this should stay open as it's too vague and not actionable without specific workflow examples. If spider code reuse is the only goal we already have more simple ways for it, from subclassing to web-poet.