Closed luisferreira93 closed 1 week ago
Sample code:
from crawl4ai import AsyncWebCrawler
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
class ExampleSpider(CrawlSpider):
name = "scrapy_integration"
start_urls = ["https://crawler-test.com/links/page_with_external_links"]
allowed_domains = ["crawler-test.com"]
rules = (
Rule(LinkExtractor(), callback="parse_item", follow=True),
)
async def parse_item(self, response):
async with AsyncWebCrawler(verbose=False) as crawler:
result = await crawler.arun(url=response.url,)
print(result.markdown)
@luisferreira93 Thanks for using the library and also appreciate the sample that you shared here for other people who may need that kind of help. FYI, we are releasing two important components by this year: One is an executor pipeline, which is very efficient and uses whatever resources are available to adaptively crawl multiple URLs at the same time. The second one is a scraper that gets the website and uses graph search algorithms to extract everything to all the layers of that website. I believe in that situation you can wrap up the whole process within the crawl4ai library. Please stay tuned and I will update you soon in my next account.
Hello! Great work with crawl4ai 👍🏻
Is it possible to integrate crawl4ai with scrapy? Do you have any code samples?
Thank you in advance