niklasb / webkit-server

[not actively maintained] The C++ webkit-server from capybara-webkit with useful extensions and Python bindings
MIT License
48 stars 38 forks source link

Fix string length computation for issue_command function #33

Open fkztw opened 7 years ago

fkztw commented 7 years ago

When arg is not a ascii char, len(arg) may be differrent from len(arg.encode("utf-8")). For example, a Chinese character '哈', len('哈') is 1, but len('哈'.encode("utf-8")) is 3, which will raise an InvalidResponseError because of the wrong length.

This PR should fix this kind of problem.

fkztw commented 7 years ago

hmm... looks like #28 and #31 are also dealing with this problem. I should add a patch which can solve it once for all and this is related to the annoyed problem of unicode and bytes of Python 2 and Python 3.

fkztw commented 7 years ago

@quique0194, @pjrobertson: Could both of you help me testing if this patch works for both of you in your scenario? I've tested this patch in both Python 2 and Python 3 with this simple script and it worked for me, I would be appreciate if you can help testing this patch, since one of you is using Python 2 and another is using Python 3. Thank you. (Click Details to see the script)

```python # -*- coding: utf-8 -*- import sys import dryscrape dryscrape.start_xvfb() br = dryscrape.Session() br.visit("https://google.com") if sys.version_info[0] == 2: PY2 = True PY3 = False if sys.version_info[0] == 3: PY2 = False PY3 = True def test_py2(): q = br.at_xpath('//*[@name="q"]') q.set(u"canción") q.form().submit() br.render("py2-unicode.png") q = br.at_xpath('//*[@name="q"]') q.set("北京") q.form().submit() br.render("py2-str.png") q = br.at_xpath('//*[@name="q"]') q.set(b"123") q.form().submit() br.render("py2-bytes.png") def test_py3(): q = br.at_xpath('//*[@name="q"]') q.set("北京") q.form().submit() br.render("py3-unicode.png") q = br.at_xpath('//*[@name="q"]') q.set("canción".encode("utf-8")) q.form().submit() br.render("py3-bytes.png") if __name__ == "__main__": if PY2: test_py2() if PY3: test_py3() print("Done") ```
PhilippVerpoort commented 7 years ago

I came across the same problem, and found a very similar solution. I can confirm that the fundamental problem is the computed length of each argument as len(arg) in line 517 of the changed file.

I have reviewed the solution presented in this pull request. I find the updated code well written, and have successfully tested the updated version with both Python2 and Python3 using the provided example script from Dryscrape, as well as a script that I had written myself for a different purpose. In response to @M157q's comment on the use of the unicode identifier after checking for Python2, I believe that this should not be a problem (see my reply to the comment). I have added one comment to the changed file requesting a fix for the indentation.

I suggest to the maintainers of this repository to merge this pull request #33, to reject pull requests #28 and #31, and to mark issues #27 and #30 as fixed (or duplicates where appropriate).

niklasb commented 7 years ago

@M157q thank you for the patch and @PhilippVeerport thank you for your detailed analysis. I will try to look at this ASAP.

niklasb commented 7 years ago

@M157q @PhilippVerpoort Thank you for your contribution. Unfortunately I have decided to no longer maintain dryscrape or webkit-server, since properly doing so would require rewriting webkit-server using QtWebEngine. Consider using a different package for your scraping if you do not want to be vulnerable to old WebKit bugs.

PhilippVerpoort commented 7 years ago

@niklasb Thanks for your reply. This is completely fair, but would you consider implementing this change and filing a new release with this bug fix anyway? At least people who do consider for some reason to use this piece of software will not have to go through the same process of manually applying this patch.