Closed Gallaecio closed 5 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 98.69%. Comparing base (
6bfb58f
) to head (94edaed
). Report is 16 commits behind head on master.
Hey! I think that's definitely a feature we should have in web-poet 👍
How hard would it be to move the form submission code to a separate library, and then use it both in Scrapy and in web-poet, to avoid double-maintenance? Is it the "click" support which makes it hard?
I did not even implement the click support here, but I don’t imagine it would make it hard.
The main reason why I did not move this to w3lib is that the function builds a request. If we move it to w3lib, what should we return? A named tuple? What should be the format of headers, a dict of values, a dict of lists, or a tuple of tuples?
The main reason why I did not move this to w3lib
I was thinking about a separate library like itemloaders or itemadapter, though w3lib also works.
If we move it to w3lib, what should we return? A named tuple? What should be the format of headers, a dict of values, a dict of lists, or a tuple of tuples?
That's a good point. It should probably return a dataclass in this case. The headers should be a either a list of tuples or a dict of lists, probably doesn't matter much (as dicts use insert order now). It doesn't have to be fancy like in web-poet - no need for case-sensitivity, etc.
About click - as I understand, it's needed when there are multiple submit (or other?) buttons on a form, and the right one is not the first one (which Scrapy picks itself). For example, if you use Formasaurus to submit a search form, you may want to use the detected "submit button", not "reset/clear button" (see https://formasaurus.readthedocs.io/en/latest/usage.html#field-types), though it needs experiments.
It's too short for a separate new library IMO.
It's too short for a separate new library IMO.
My 2c: it's not big here, but
So, overall, it seems it'd get us to 500+ lines of non-obvious code, which should be enough for a library with a well-defined purpose: given html form, return a request to submit it.
So, between a new w3lib.form
module and a new (form2request
?) package, any preference?
My preference (+0.5) is a new library in scrapy org.
HttpRequest.from_form
is based on Scrapy’s FormRequest.from_response(), but with a minimal API and without support for theclick
thing which I don’t fully understand.