Open BurnzZ opened 3 months ago
It's basically https://github.com/scrapy/scrapy/issues/6047
The exception is bubbled up to the deferred created with self.crawler_process.crawl()
in the crawl
or runspider
command, but that deferred has no errback.
(No idea why is this situation handled differently on 3.11 and 3.12 :shrug:)
So ideally we just shouldn't rely on unhandled exceptions, unless we fix Scrapy.
So in Python 3.12+ Twisted no longer reports (exceptions in) unhandled deferreds?
Not sure what could have changed.
from scrapy import Spider
class MySpider(Spider):
name = "spider"
def __init__(self, *args, **kwargs):
1/0
This shows an unhandled exception on both Python versions.
So far I was able to minimize it to this:
import scrapy
from pydantic import BaseModel, model_validator
class Model(BaseModel):
@model_validator(mode="after")
def foo(self):
raise ValueError()
class Spider(scrapy.Spider):
name = "spider"
def __init__(self, *args, **kwargs) -> None:
Model()
super().__init__(*args, **kwargs)
Just having e.g. a required field is not enough to trigger this.
Overview
From the following PRs:
We have respectively introduced
urls_file
andurls
as new parameters to indicate input URLs to the crawls, alongside the existingurl
parameter.Should none of these 3 parameters are supplied to a crawl, the expected behavior would be to have the following error message:
However, it would seem that when using Python 3.12, the error does not exist.
Code to Reproduce
Python 3.11
python file.py
scrapy crawl spider
Python 3.12
python file.py
scrapy crawl spider
(no error at all)