Cancensus caches census data and geographies across sessions if a permanent cache directory is specified. This saves time, bandwidths, makes offline processing possible and preserves API points and server resources.
But the cache is not transparent, it can’t be reasonably inspected by the user, can’t be manually or selectively cleared. Moreover, it is difficult to implement a mechanism to refresh the cache when upstream census data changes. During the 2016 census release StatCan recalled and replaced data from several releases due to errors. Hopefully this won’t happen again with the 2021 release, but if it does, a more transparent cache will make it easier to notify users when they are using outdated data and refresh it.
As a first step we should create specifications for how to store local metadata for cached data. This should contain regions and census vectors of the data, as well as time cached and time last accessed, as well as size on disk. It could be either stored in a separate matadata file for each cache file, or in a single database.
Cache management functions for selectively deleting or refreshing parts of the cache should be added.
Cancensus caches census data and geographies across sessions if a permanent cache directory is specified. This saves time, bandwidths, makes offline processing possible and preserves API points and server resources.
But the cache is not transparent, it can’t be reasonably inspected by the user, can’t be manually or selectively cleared. Moreover, it is difficult to implement a mechanism to refresh the cache when upstream census data changes. During the 2016 census release StatCan recalled and replaced data from several releases due to errors. Hopefully this won’t happen again with the 2021 release, but if it does, a more transparent cache will make it easier to notify users when they are using outdated data and refresh it.
As a first step we should create specifications for how to store local metadata for cached data. This should contain regions and census vectors of the data, as well as time cached and time last accessed, as well as size on disk. It could be either stored in a separate matadata file for each cache file, or in a single database.
Cache management functions for selectively deleting or refreshing parts of the cache should be added.