Closed maramsumanth closed 2 years ago
Merging #3579 (66e2004) into master (9f81de2) will not change coverage. The diff coverage is
100.00%
.
@@ Coverage Diff @@
## master #3579 +/- ##
=======================================
Coverage 88.18% 88.18%
=======================================
Files 162 162
Lines 10490 10490
Branches 1516 1516
=======================================
Hits 9251 9251
Misses 965 965
Partials 274 274
Impacted Files | Coverage Ξ | |
---|---|---|
scrapy/http/request/form.py | 95.48% <100.00%> (ΓΈ) |
@lopuhin , what can be the reason here for failure of codecov/patch ?
I couldn't take a deep look into the algorithm itself, these are just a few Python suggestions to improve readibility:
There are some duplicated occurences of urlsplit(self.url)
, you could assign that to a variable instead
It's better to iterate sequences directly instead of by index, i.e.
In [1]: keys = ['a', 'b', 'c']
In [2]: for k in keys: # good
...: print(k)
a
b
c
In [3]: for i in range(len(keys)): # not so good
...: print(keys[i])
a
b
c
3. If you need to iterate at the same time over more than one iterator you could use [`zip`](https://docs.python.org/3.6/library/functions.html#zip)
```python
In [4]: values = [1, 2, 3]
In [5]: for k, v in zip(keys, values):
...: print('{}={}'.format(k, v))
a=1
b=2
c=3
Finally, if I'm not mistaken the Codecov failure is because some of the new lines are not hit by any of the existing tests, that's why it says "will decrease coverage by 0.04%"
@elacuesta , thanks for the suggestion :) π. I have modified code to improve readability.
Done @amarynets. Thanks for your suggestions! :) π
@lopuhin , can you please look into this?
http://www.example.com/?a=0&a=2&b=1&c=3#fragment
data = {'a': 1}
Looks like it's wrong request. I tested it by Postman and got 400 status code As far as I know all data from GET form should be sent by query string and duplicates of keys can exist because on the server it will be transformed to list. More detail info on SO
@amarynets thanks for review! Could you please clarify what you mean here? I didn't quite understand what data = {'a': 1}
means here. I agree that duplicate GET arguments are possible and we should preserve them, but not add new duplicates. Do you mean it's not handled correctly in this PR? If yes, could you please clarify which case is not handled correctly?
@maramsumanth I see that some duplicate cases are checked in tests, but could you please add an explicit test for the case then there is a duplicate GET argument and that it's preserved when we add another unlreated argument? (it seems this is not tested directly).
@lopuhin Thanks for your review. I have made the changes :) :+1:
@whalebot-helmsman @lopuhin. I have made the necessary changes, can you review?
I think code will be much easier to read if you rename querystr
to form_query_str
and query_str
to url_query_str
Done @whalebot-helmsman :+1:
@amarynets @lopuhin @maramsumanth I think FormRequest
behaviour should mirror browser behaviour. To test browser behaviour I create simple HTML file https://gist.githubusercontent.com/whalebot-helmsman/a6c3d2eb78ae57a072af11f8b726ff83/raw/3f281be8ce0f08ea5d81deb443c816b6fbe846f3/forms.html. You can download local copy and test. In case of GET form request both Chrome and Firefox drop url_query_str
from URL and add form_query_str
.
@maramsumanth Based on browser behaviour I proposed to change implementation in current PR. You up voted this proposal and didn't apply proposed changes. Is there something what prevent you from applying these changes?
Thanks for reminding me @whalebot-helmsman. I was working on the Redirect Middleware part of scrapy.
Coming to this PR ,I was little bit confused. In the following example:-
FormRequest('http://example.com/?id=9', method='GET', formdata={'id': '8'}).url
should give
'http://example.com/?id=8'
i.e it should replace the value if same key is found. It is working as it should. Can you give me detailed explanation of what is the problem. :)
Thanks
@whalebot-helmsman , can you please look into this so that I can work on it ASAP. :+1: :)
I referenced HTML file as example. It has two forms. First one uses GET request, has a
and b
as keys and targets http://example.com/f?a=1&d=2 URL. After you click Submit
button you get to http://example.com/f?a=3&b=2. Browser not only replace value for a
key, but drop value for d
key. Reformulating in terms of python structures we got
FormRequest('http://example.com/?id=9&smth=a', method='GET', formdata={'id': '8'}).url == 'http://example.com/?id=8'
I have Ubuntu OS in which I am using chromium browser.
But for FormRequest('http://example.com/?id=9&smth=a', method='GET', formdata={'id': '8'}).url
my output is 'http://example.com/?smth=a&id=8'
which is != 'http://example.com/?id=8'
Above thing is handled in the last part of tests that I have written in this PR. Does chromium and chrome browsers behave differently?
@whalebot-helmsman , when I tested the HTML file you provided, I can see that in case of GET form request both Chrome and Firefox drop url_query_str
from URL and add form_query_str
. But running through the command line as described in the above comment, I cannot see any problem.
I can see that in case of GET form request both Chrome and Firefox drop
url_query_str
from URL and addform_query_str
.
Scrapy and related libraries should mirror browser behaviour. At this moment in master and in your branch
>>> from scrapy.http.request.form import FormRequest;FormRequest('http://example.com/?id=9&smth=a', method='GET', formdata={'id': '8'}).url == 'http://example.com/?id=8'
False
This is incorrect representation of browser behaviour. I understand, such behaviour was implemented before your PR. As we started to work on this part we should also fix this.
Now I understood the problem. Is there any python code to get the browser that the user is using @whalebot-helmsman ?
@whalebot-helmsman , when I tested the HTML file you provided, I can see that in case of GET form request both Chrome and Firefox drop
url_query_str
from URL and addform_query_str
.
It is a browser behaviour. We agreed on that.
Scrapy implementation is separate from installed browser. Scrapy behaviour doesn't depend on installed browser. In master branch we preserve both url_query_str
and form_query_str
, In branch for this PR we replace values for common keys in url_query_str
by values from form_query_str
.
Scrapy's behaviour in both branches doesn't mirror browser behaviour. It is a problem, we should fix it.
I propose to fix it in this PR.
Current behaviour
>>> from scrapy.http.request.form import FormRequest;FormRequest('http://example.com/?id=9&smth=a', method='GET', formdata={'id': '8'}).url
'http://example.com/?smth=a&id=8'
Behaviour after fix
>>> from scrapy.http.request.form import FormRequest;FormRequest('http://example.com/?id=9&smth=a', method='GET', formdata={'id': '8'}).url
'http://example.com/?id=8'
@whalebot-helmsman Thanks for guiding me. Till now I was thinking that, we have to somewhere account for the browser name and proceed accordingly. Now I have updated the PR and changed tests as you suggested :)
@whalebot-helmsman , should I change anything or is it safe to merge. :+1: Thanks :)
It is safe to merge now. But merging takes more time and not required by GSoC rules. We can keep it as it is for now.
Now its fine :) :+1:
Solved https://github.com/scrapy/scrapy/issues/2919.
Combining
querystr
andformdata
into dictionary, and thereby eliminating the duplicates. @kmike , @lopuhin , please review. I have also corrected the code, by creating a list where we want to passduplicate keys
. as described in comment https://github.com/scrapy/scrapy/pull/3269#issuecomment-412260614I am currently preparing for GSoC 2019, please guide me if there are any mistakes/suggestions. I would be glad if you give me suggestions for getting selected in scrapy. Thanks in advance :) π