scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
52.84k stars 10.53k forks source link

scrapy contracts issues #1918

Open pawelmhm opened 8 years ago

pawelmhm commented 8 years ago

I tried using Scrapy contracts today and found following problems, I'm creating issue for future reference in case others find it helpful (perhaps it could also serve as guide for future improvements).

It's very difficult (or even impossible) to customize test case to suit your needs. In my project some class of spiders always expect response with some meta keys. This seems like common use case, but there is no easy way to pass meta to spider contract. I found out the only way is to create custom contract that updates args for request, but this custom contract has to be added to contract docstring as well which is not documented (I thought adding custom contract to settings is enough, but no you also have to add @custom_contract_name to docstring).

There is no easy way to customize request being made from contract. I need to test callback that is response to POST with some formdata, headers, cookies and some meta keys. Passing all those values to Request init should do the trick, but there is no simple api to do that.

There is no timeouts for some tests, I noticed that one of the spider was hanging for long time, it would be better to close it down rather than wait for it to end.

There is no way to pass command line arguments to spider test. It would be very useful but currently the way contracts are designed you cannot do this in any way (you can create custom contract that will pass arguments to Request(), but there's no contract for spider init).

I think the problems are mostly result of docstring test format. Ideally you should be able to specify meta in dosctring, e.g.

@url http://example.com
@meta {"a": "b"}
@return Item

but with current implementation args for meta will be parsed as strings and they are split on whitespace, so to parse this you'd have to actually write some JSON and this JSON must be without whitespace. Same for specyfing request init args. Ideally I would do something like this

@request {"method": "POST", "headers": {"foo":"bar"}}

but this is not possible with current implementation.

How about switching to yaml? This will allow us to have test description like this


request: 
    url: http://example.com
    method: POST,
    meta : 
         variant_request: True
         item: 
               name: foobar
    headers:
          header-one: header-value
          header-two: another-value
spider_init_args:
    zipcode: 14001
returns:
     item: 
         name: "bar"

this should allow detailed specification of test case. Yaml would be processed into python dict, and from this dict we could create spider test cases that will control spider init, request init and add proper tests on output. Above would translate to following description: "create POST Request with following meta and following headers; initialize spider with following argument; return item with name bar".

Aside from switching to yaml we could move contracts out of spider docstrings. Maybe this yaml file with test specs could be stored outside spider code, e.g. you could have folder spider_tests full of yaml files describing each test case. In case your tests grow large you can easily manage that, you have it in yaml so there is nice syntax highlighting and everything is more readable.

kmike commented 8 years ago

Hi @pawelmhm,

Thanks for writing this; it looks like a good reference for anyone who want to give alternative contracts library a try or who wants to fix scrapy contracts.

The general sentiment is that we should extract contracts to a separate library and move it to scrapy-plugins organization. YAML-based contracts can be either added to this library, or one can create a separate library for YAML-based contracts.

pbronez commented 8 years ago

It would be great to have contracts be more accessible. The exisiting setup isn't powerful enough for my purposes, and third party test tools are more difficult to integrate. Just today I was wishing for a straightforward way to test items against JSONschema. A contracts module would be a clear place to add this.

ElToro1966 commented 6 years ago

It would also be great to have better error reporting in contracts. More often than not, I just get "Unhandled error in Deferred" and nothing more when introducing errors in contracts, see also this. Moreover, improved documentation - including more examples - would go a long way in improving the ease of use of contracts.

tianhuil commented 4 years ago

It would be great to have contracts be more accessible. The exisiting setup isn't powerful enough for my purposes, and third party test tools are more difficult to integrate. Just today I was wishing for a straightforward way to test items against JSONschema. A contracts module would be a clear place to add this.

+1