jamesturk / spatula

A modern Python library for writing maintainable web scrapers.
https://jamesturk.github.io/spatula/
MIT License
244 stars 11 forks source link

WIP: add some test_utils to experiment with #38

Open jamesturk opened 1 year ago

jamesturk commented 1 year ago

Aims to address #37 CachedTestURL is the workhorse here, it replaces URL (and can be hot swapped in for it) and instead of always making a request, it favors a locally cached copy. These copies could be generated manually, but there is also an environment variable that will tell spatula to fetch them if they are missing.

Right now it takes the same properties as URL, but could be extended to take response text as suggested in #37 (Also, needs to use all properties of request/response in caching.)

This branch also adds two helper methods so people don't have to work with this directly if they don't want to: