Open kmike opened 9 years ago
It is a good idea, do you plan to implement the default .name
as a python property ?
do you plan to implement the default .name as a python property
i like this idea, haven't thought about it
I like it too. :)
So, would be something like:
@property
def name(self):
return getattr(self.__class__, 'name', self.__class__.__name__)
And without the check for the name attribute on init?
Uh, I've just realized the above code would fail for the case when name is set via __init__
.
Actually, would fail for other cases as well, I clearly didn't think this through. :D
Hi, Dear all, How could I get rid of the subscribe of this email?
Best regards Fu Qi
在 2015-11-02 20:48:50,"Mikhail Korobov" notifications@github.com 写道:
do you plan to implement the default .name as a python property
i like this idea
— Reply to this email directly or view it on GitHub.
@qinfuqi I don't know; it depends on how did you get subscribed to it. Maybe you need to click "Unwatch" button somewhere. There is also 'Unsubscribe' button in the left column here. We haven't subscribed you to these emails; I think you're using some Github feature, and it is a general Github usage question. Sorry - we can do nothing; we don't have powers to unsubscribe you.
Hi @kmike I'd like to start to contribute with scrapy and I'm trying to work on this issue. I've some questions about it:
Spider
, should the default name contains the Spider? eg. class StackOverflowSpider(scrapy.Spider):
-> should I name it as StackOverflowSpider
or StackOverflow
?__init__
to create a default name, we need to instantiate this object to get the default value. Some classes (https://github.com/scrapy/scrapy/blob/master/scrapy/spiderloader.py#L25, https://github.com/scrapy/scrapy/blob/master/scrapy%2Fcommands%2Fcheck.py#L78) checks the spider name. So to get the default value working, it's necessary to instantiate the class in these files. What do you think about it? Can I create a spcls_instance = spcls()
to get the default name?Hey @aron-bordin, thanks for offering help!
in class names that ends with Spider, should the default name contains the Spider? eg. class StackOverflowSpider(scrapy.Spider): -> should I name it as StackOverflowSpider or StackOverflow?
I see what you're heading to, but I think we should preserve the name as-is.
if we use a property or add some code on init to create a default name, we need to instantiate this object to get the default value. Some classes (https://github.com/scrapy/scrapy/blob/master/scrapy/spiderloader.py#L25, https://github.com/scrapy/scrapy/blob/master/scrapy%2Fcommands%2Fcheck.py#L78) checks the spider name. So to get the default value working, it's necessary to instantiate the class in these files. What do you think about it? Can I create a spcls_instance = spcls() to get the default name?
We shouldn't call __init__
and create Spider instances just to get their names; there are many problems with calling __init__
- some spiders may have required attributes, other spiders may do heavy work in __init__
or create/delete some resources. __init__
is called after Crawler instance is created, and spiders may rely on it.
This means @property
on object methods won't work; to make magic name
work on class you can define a metaclass. Another option (which I prefer more - it is less magical and easier to support) is to create and use a Spider.get_spider_name
classmethod which returns spider name either from .name attribute or from class name.
I opened a pr. Please, when possible, check if my approach is good and if there is something that I can change to improve it. Thx
Is there actually any greater value in this Spider-class name
attribute?
Does it make sense to put this emphasis on it, instead of simply using the Spider-class name itself?
If, as @kmike says, there is only SpiderLoader
using it, that code looks straightforward enough to substitute looking up the Spider-class name instead (or simply pulling in all [Base]Spider
subclasses).
The only interesting case here seems to be having different classes with the same name attribute? What's the defined behavior there, and could it be replicated by a user using python class inheritance rules instead? In which case I would downgrade this into an extension/addon feature, for whom may care, instead of trading in more magic for less importance ;)
If and when this is done, please take into account that Spidermon currently uses spider names, for example to generate unique, spider-specific filenames for storing data in disk. You can search for spider.name
there to find some of those usages.
Any change in this direction in Scrapy should probably be accompanied by the corresponding change in Spidermon.
List of usages of spider.name
attribute in scrapy source code:
name
attribute (I skipped it here)spider.name
. Here mentioned spidermonother 3rd party libs that also can use spider.name. Here mentioned spidermon
If Scrapy adds a default property name
for the scenario where the user doesn't define a name
attribute, probably it won't require any changes in Spidermon. Is there a PR for this change so I can run it and test in Spidermon?
I created https://github.com/scrapy/scrapy/pull/4327 some time ago, but I was not planning on resuming that work myself any time soon. In any case, having name
return some name by default (the spider class import path?) is something to consider for sure.
My preferred approach to solve this is https://github.com/scrapy/scrapy/pull/1649 at least on my attempts to apply it I didn't have issue with base class scrapy.Spider
in scrapy list
output mentioned on https://github.com/scrapy/scrapy/pull/1649#issuecomment-178697597 (but it still require to resolve merge conflict to macth current scrapy version) probably because spiderloader itself also significantly changed from 2016 - simply because with this approach only change for base spider class required (and do not required to update each point from https://github.com/scrapy/scrapy/issues/1157#issuecomment-1816782204)
I think we should make Spider.name attribute optional. The name is used by SpiderManager to find spiders, but Spider can be used without a Scrapy project. It is unnecessary boilerplate for users of runspider command or for CrawlerRunner / CrawlerProcess users.
We can also provide a default value, e.g.
self.__class__.__name__
, to help with discovery; this have an advantage of 1-to-1 mapping between spider class names and names printed to users - spiders can become easier to find.Opinions?