Previously we ran the scraper and api in the same process, with the
scraper running before the api started listening. This caused us to
experience a very long startup time before the api became available,
which is not a fun experience when trying to push out an update. In
addition if a scrape failed for any reason, such as an external service
not being available, then it would take down this api - which knocked
our api docs reference site offline.
Furthermore this will allow us to scale our api without having to
allocate the large amounts of resources that the scraping process needs.
This patch replaces the default memory storage with a new local file
storage implementation, which fits this new model.
Note that this now does not run the scraping unless explicitly started
by running the separate scraper binary. They are packaged together for
convenience.
Previously we ran the scraper and api in the same process, with the scraper running before the api started listening. This caused us to experience a very long startup time before the api became available, which is not a fun experience when trying to push out an update. In addition if a scrape failed for any reason, such as an external service not being available, then it would take down this api - which knocked our api docs reference site offline.
Furthermore this will allow us to scale our api without having to allocate the large amounts of resources that the scraping process needs.
This patch replaces the default memory storage with a new local file storage implementation, which fits this new model.
Note that this now does not run the scraping unless explicitly started by running the separate scraper binary. They are packaged together for convenience.