Open fkztw opened 7 years ago
hmm... looks like #28 and #31 are also dealing with this problem. I should add a patch which can solve it once for all and this is related to the annoyed problem of unicode and bytes of Python 2 and Python 3.
@quique0194, @pjrobertson: Could both of you help me testing if this patch works for both of you in your scenario? I've tested this patch in both Python 2 and Python 3 with this simple script and it worked for me, I would be appreciate if you can help testing this patch, since one of you is using Python 2 and another is using Python 3. Thank you. (Click Details to see the script)
I came across the same problem, and found a very similar solution. I can confirm that the fundamental problem is the computed length of each argument as len(arg)
in line 517 of the changed file.
I have reviewed the solution presented in this pull request. I find the updated code well written, and have successfully tested the updated version with both Python2 and Python3 using the provided example script from Dryscrape, as well as a script that I had written myself for a different purpose. In response to @M157q's comment on the use of the unicode
identifier after checking for Python2, I believe that this should not be a problem (see my reply to the comment). I have added one comment to the changed file requesting a fix for the indentation.
I suggest to the maintainers of this repository to merge this pull request #33, to reject pull requests #28 and #31, and to mark issues #27 and #30 as fixed (or duplicates where appropriate).
@M157q thank you for the patch and @PhilippVeerport thank you for your detailed analysis. I will try to look at this ASAP.
@M157q @PhilippVerpoort Thank you for your contribution. Unfortunately I have decided to no longer maintain dryscrape or webkit-server, since properly doing so would require rewriting webkit-server using QtWebEngine. Consider using a different package for your scraping if you do not want to be vulnerable to old WebKit bugs.
@niklasb Thanks for your reply. This is completely fair, but would you consider implementing this change and filing a new release with this bug fix anyway? At least people who do consider for some reason to use this piece of software will not have to go through the same process of manually applying this patch.
When
arg
is not a ascii char,len(arg)
may be differrent fromlen(arg.encode("utf-8"))
. For example, a Chinese character'哈'
,len('哈')
is 1, butlen('哈'.encode("utf-8"))
is 3, which will raise anInvalidResponseError
because of the wrong length.This PR should fix this kind of problem.