Open runa opened 1 year ago
@runa can you add some sample code to reproduce this and add more details? I tested with this simple spider
import scrapy
class ToScrapeCSSSpider(scrapy.Spider):
name = "toscrape-css"
start_urls = [
'http://quotes.toscrape.com/',
]
custom_settings = {
'FEEDS': {
'items.json': {
'format': 'json'
}
}
}
def parse(self, response):
for quote in response.css("div.quote"):
yield {
'text': quote.css("span.text::text").extract_first(),
'author': quote.css("small.author::text").extract_first(),
'tags': quote.css("div.tags > a.tag::text").extract()
}
next_page_url = response.css("li.next > a::attr(href)").extract_first()
if next_page_url is not None:
yield scrapy.Request(response.urljoin(next_page_url))
and when scheduled with ScrapyRT
curl --location 'http://localhost:9080/crawl.json' \
--header 'Content-Type: application/json' \
--data '{
"request": {
"url": "https://quotes.toscrape.com/"
},
"spider_name": "toscrape-css"
}'
I see there is items.json file generated in filesystem of spider project. Is there some specific feed that is failing for you?
Hi! thanks for your work on Scrapyrt!
I've discovered that spiders served by Scrapyrt don't save the output in the Spider's / custom_settings / FEEDS. Is it possible to change this behavior and make the spider served by Scrapyrt respect this setting?
Thanks!