Open nyurik opened 4 months ago
Thanks for opening the discussion @nyurik!
I think we definitely need to pass any notification back to the caller. For example, in Stadia Maps' own usage, we'd have a (potentially) a few layers of caches to invalidate if we want to support data updates without reloading. If then, the caller wants to retry, it's easy enough with tooling.
Regarding the HTTP/S3 mechanisms, I'm not sure etag
alone is sufficient (since it's not universal) and there's other headers (e.g., last-modified
) which might be useful. For S3 specifically, x-amz-version-id
could be useful to indicate the version changed (or even restrict to a specific object version).
I would like to see a few concrete use cases before we design an API around it so we actually know what user needs are.
@lseelenbinder I am ok to simply return an error for now. I believe etag is fairly universal, but I agree it might not cover all cases. How about this API, without any other code:
use pmtiles::reqwest::header::HeaderName;
// we already re-export reqwest, so it is easy for users to pass
// `Some(pmtiles::reqwest::header::ETAG)`
// If `None`, no header monitoring
HttpBackend::try_from(..., monitor_header: Option<HeaderName>) {...}
// For more advanced cases, we MAY eventually introduce
// some callback fn, but YAGNI until needed.
So it is totally up to the user to handle or not handle this, and passing None
is identical to current API. File monitoring is also left out because it can also be done by the user without API changes (we may add it later).
P.S. Note that there is no way to reset AsyncPmTilesReader
to accept the new value - which I think is OK as it will force the user to re-instantiate it.
In case of an HTTP or S3 bucket backends, the pmtiles file could get modified while being used, and should be handled properly.
etag
(if returned), and when modified, allread
operations should return a new error codeIn order to make this possible, the backend API would need to change. Currently, we have
read
andread_exact
. Most of the time,read_exact
is used. Theread
is actually only used once - to read the initial block (main directory). I propose we convertThis way each backend can self initialize and read first block of data, and while doing it, may also set some monitors or get etag data - returning both the new backend instance and the first block of data. We may want to make all backend creation non-public, but this is not certain yet.
The next step is how to handle the actual data change. The actual backend will return some new
PmtError::UnderlyingDataModified
(naming?), but we need a new logic for thefn get_tile(...)
. It could simply pass through that error to the caller, and let the caller handle it (e.g. caller may close and re-open the backend, invalidating the cache as well). Or we could do that insideget_tile
, but the logic might get tricky - for example backend requests could start hitting two load-balanced endpoints that just happened to have different versions of the same file, thus producing differentetag
values.One possible solution is to introduce a new
get_tile_with_retry(&mut self, ...)
-- allowingself
to be modified, but I don't think its a workable solution -- most of the users would want to parallelize this call, so&mut self
is not possible. Instead, we may need to introduce a thicker wrapper that stores backend inside anArc<RwLock<Backend>>
- in which case we could still useget_tile(&self,...)
, but inside it will get a read lock, get the tile, and re-create the backend if it is outdated.The problem here is that the caller may still need to know if the backend was modified in order to reset some other internal cache