Open jamesturk opened 1 year ago
It'd be nice to be able to pass a string (response or byte-string or whatever you think is best) into our Page class via ExamplePage(response="...")
where "..." is the contents of what we'd like it to parse. If we need to wrap the string with a Response object or something that's fine too. Then we can test the scraper against it.
I'm happy to expand this more if it'd be helpful.
This looks handy: https://github.com/jamesturk/spatula/blob/2bf8f378c8a83d36fb50b362a5895181794ee733/tests/test_pages.py#L18-L37
I'll try again to test a few of my scrapers this week. I think the main pain point is having a way to test selectors for a given page type and quickly see what broke.
I want to think through this a bit & welcome feedback from anyone that'd like better ways to test their scrapers written using spatula.
The problem this is attempting to solve is that when writing scrapers, you might want the ability to test against a cached page, you would also want the ability to update your cached copy easily. This feels like it falls well within spatula's domain and spatula could offer a solution that works for common cases.
I've considered a few approaches & currently leaning towards the following:
Idea: Provide helper to turn page into a TestablePage
Sources are responsible for fetching themselves in Source.get_response, by replacing sources with special caching versions, an existing Page can be tested against a cached response.
def test_example_page():
this would replace all of a page's sources with a new TestCacheURL, other parameters would stay the same
TestCacheURL would do the following:
This would be pretty simple for 80% of cases, it might get complicated for pages that yield back other pages, etc. since presumably you'd want to have their sources replaced too.
I'd also considered just having a global flag that alters how URL sources work (SPATULA_TEST_MODE) but not sure I like that approach yet.