Closed marinelay closed 4 weeks ago
Hi @marinelay and @wRAR, while looking through this good first issue, I found the line in scrapy, which seems to be the root cause of providing this error for Request with WrappedRequest as shown in the line --
===================================================================== from urllib.request import Request as _Request from scrapy.http.request import Request from scrapy.http.cookies import WrappedRequest
a = _Request(url="https://a.example") print(a.get_header('xxxx'))
b = WrappedRequest(Request(url="https://a.example")) print("WrappedRequest get-header result:", b.get_header('xxxx')) -- This one
=====================================================================
None TypeError: to_unicode must receive a bytes or str object, got NoneType -- Result
=====================================================================
Where as per behavior, it should be returning None for both the requests.
This is line 173 in "http/response/cookies.py" --
return to_unicode(self.request.headers.get(name, default), errors="replace")
Because of "to_unicode" being used, which as per function definition says --
"""Return the unicode representation of a bytes object text
. If
text
is already an unicode object, return it as-is."""
Here is checks output of 'self.request.headers.get(name, default), errors="replace"', which in this case would be "str" to be a valid candidate. If this is not the case, hence the error - "TypeError: to_unicode must receive a bytes or str object, got NoneType".
Hence a viable solution to this can be - "return self.request.headers.get(name, default)", which returns the output as "None".
Do let me know if this is the correct solution to this, or it might interfere with some other functionality for it.
Removing the to_unicode() call is obviously incorrect.
Then how about exception handling code such like this?
def get_header(self, name, default=None):
try:
return to_unicode(self.request.headers.get(name, default), errors="replace")
except TypeError:
return default
Hi @marinelay, I guess that gives the required output. Since it usually throws a 'TypeError', handling this part using an exception block is neat 👍.
Description
I believe
WrappedRequest
is a class for supporting methods inurllib.Request
, but I found a behavior ofget_header
method inWrappedRequest
is different fromurllib.Requests
. When trying to get a header which is not present without default value, it should returnNone
value (https://docs.python.org/3/library/urllib.request.html#urllib.request.Request.get_header), butget_header
inWrappedRequest
raisesTypeError: to_unicode must receive a bytes or str object, got NoneType
.Steps to Reproduce
Expected behavior:
Actual behavior:
Additional context
This issue is currently not a problem when interacting with the
CookieJar
class. Thus, it is not an urgent matter unless one is importing and usingWrappedRequest
. However, I think it would enhance the reliability in this project. Thank you!