docsearch: add Spider Tests

jchristgit commented 7 years ago

Spider Contracts enable us to easily validate different parsing functions from our Spiders. Adding these would help ensure that the parsers function correctly and reduce testing needs from our end.

jchristgit commented 7 years ago

I'm having a bunch of issues with this.

Apparently, scrapy only runs contracts inside the docstring of parse while ignoring the others. Since the cppreference spider uses the parse method as an entry point for distributing different types of pages to their respective parsers, this isn't really effective.

I currently have the following (method bodies cut out for readability):

def parse(self, response):
    """
    @url http://en.cppreference.com/w/cpp/symbol_index
    @returns requests 1
    """
    ...

def parse_symbol_index(self, response):
    """
    @url http://en.cppreference.com/w/cpp/symbol_index
    @returns requests 700
    """
    ...

def parse_std_symbol(response):
    """
    @url http://en.cppreference.com/w/cpp/symbol_index
    @returns items 740
    @scrapes names defined_in_header sigs desc return params example link
    """
    ...

All that scrappy returns when running scrapy check cppreference -v is:

[cppreference] parse (@returns post-hook) ... ok

----------------------------------------------------------------------
Ran 1 contract in 4.883s

OK

... which is not the intended function of this.

jchristgit commented 7 years ago

Closing this since the spider contracts do not work correctly for this. Instead, after merging the branch, we will add a test suite for various functions that were built for the scrapers as well as various utility functions.

strinking / docflow

docsearch: add Spider Tests #13