Closed alexander-matsievsky closed 10 months ago
scrapy parse
command is hiding the original callback from us. It wraps the original callback in the Command.prepare_request
method. The solution would be to leave clues about the original spider and callback on request's meta and query this information when getting the callback in scrapy-poet's utils.get_callback
function.
diff --git a/scrapy/commands/parse.py b/scrapy/commands/parse.py
index 83ee074d..21dba3e8 100644
--- a/scrapy/commands/parse.py
+++ b/scrapy/commands/parse.py
@@ -215,7 +215,7 @@ class Command(BaseRunSpiderCommand):
request.cb_kwargs.update(opts.cbkwargs)
request.meta['_depth'] = 1
- request.meta['_callback'] = request.callback
+ request.meta['_callback'] = request.callback or opts.callback
request.callback = callback
return request
diff --git a/scrapy_poet/utils.py b/scrapy_poet/utils.py
index cab27c2..0ecc6e7 100644
--- a/scrapy_poet/utils.py
+++ b/scrapy_poet/utils.py
@@ -31,9 +31,12 @@ _SCRAPY_PROVIDED_CLASSES = {
def get_callback(request, spider):
"""Get request.callback of a scrapy.Request, as a callable."""
- if request.callback is None:
- return getattr(spider, 'parse')
- return request.callback
+ callback = request.meta.get('_callback', request.callback)
+ if not callable(callback):
+ callback = getattr(spider, callback or 'parse')
+
+ return callback
class DummyResponse(Response):
Let me know what you guys think about it and I'll create the pull requests if there's a green light for it. @ivanprado @kmike @ejulio
It actually looks like a Scrapy issue, if a new callback doesn't have the same signature; it seems this should ideally be fixed on a Scrapy side, not worked around on scrapy-poet side.
Indeed, @kmike. Do you have any strategy in mind? Because I'm not sure it would be that easy to keep the original signature without investing too much time on it.
At this moment I don't know how it would be possible to copy the signature from function to function. Let's say you have:
def original_fn(v1: int, v2: str):
pass
And you have a function that does something else apart of calling the original one:
def modified_fn(*args, **kwargs):
do_something()
original_fn(*args, **kwargs)
And the question is, would it be possible to "copy" signature from orignal_fn to modified_fn? I have not an answer. A look to the function inspect.signature
(see here: https://github.com/python/cpython/blob/master/Lib/inspect.py#L2232) reveals that the question could not be simple.
+1
As far as I could see, there are a couple of options:
In both cases, I think it's a overly complicated solution that would be difficult to make pass on Scrapy reviews. Like, how would we justify such change?
I think giving a clue about the original callback on Scrapy parse command and consulting it when getting a callback on our library is a feasible and succinct solution if well documented.
I think I could have find the solution: The update_wrapper
function
https://docs.python.org/2/library/functools.html#functools.update_wrapper
See the following:
def original_fn(v1: int, v2: str):
...: print(f"{v1} {v2}")
...:
def do_something():
...: print("something")
...:
def modified_fn(*args, **kwargs):
...: do_something()
...: original_fn(*args, **kwargs)
...:
signature(modified_fn)
Out[22]: <Signature (*args, **kwargs)>
update_wrapper(modified_fn, original_fn)
Out[23]: <function __main__.original_fn(v1: int, v2: str)>
signature(modified_fn)
Out[24]: <Signature (v1: int, v2: str)>
modified_fn(1,2)
something
1 2
So maybe it is a matter of calling update_wrapper
somewhere in the Scrapy code to update the signature. @victor-torres what do you think?
Sounds like an option! Let me test here with my snippet.
Initial tests show it works, @ivanprado! 🚀
# Match the new callback signature with the original one
import functools
original_callback = getattr(spider, opts.callback)
functools.update_wrapper(callback, original_callback)
The thing is that the current code is using a response to determine the actual callback and the response is received by the callback itself... Not sure if it's completely safe to stick to the request since we're using parse
command and not a "real world" spider. Would you mind double-checking it? https://github.com/scrapy/scrapy/blob/master/scrapy/commands/parse.py#L159
Another alternative would be to check if opts.callback is defined before trying to update the wrapper, but I think it wouldn't be a nice solution since the default callback (parse
) could also fall into the same problem...
@victor-torres I'm not entirely sure about what I'm going to say, but this how I see it: 1) I would move part of the callback discovery to this line: https://github.com/scrapy/scrapy/blob/master/scrapy/commands/parse.py#L218
Particularly, I would check there if it is comes from options, otherwise, I would check if callable, and if it is a string, I would try to obtain if from the spider. At this point you would have a callback that you can use with update_wrapper
to copy the signature. If you don't have anything, then you don't run update_wrapper
.
2) I would update the callback. Now it would not be handling most of the callback discovery. Only the part for the rules that seem to come from the response. Note: I have no idea what is this rules thing. Do you know what is that? Do you think this could be used in combination with poet callbacks?
That's all I can say. I don't have a deep understanding of this part of the code, so this is as far as I can go by now. How do you see it @victor-torres ?
I'm not sure either. Maybe @kmike has a better understanding of this piece of code. I suspect we would need the response anyways.
Spider code is taken from https://scrapy-poet.readthedocs.io/en/latest/intro/tutorial.html:
Passing
Failing