protomaps / go-pmtiles

Single-file executable tool for working with PMTiles archives
BSD 3-Clause "New" or "Revised" License
350 stars 48 forks source link

Fix 16k file bucket #143

Closed bdon closed 6 months ago

bdon commented 6 months ago

@msbarry

currently archives less than 16K don't work with file buckets (maybe same for HTTP depending on server)

The initial IO is for 16384 bytes to save roundtrips over the network. This is a useful optimization, and it doesn't seem worth special-casing non-network archives to do something different (do one IO for a few bytes to then get the exact length...)

However doing a ReadAt for 16384 bytes will read the entire archive and then raise an EOF error. This PR special cases range requests at offset 0 to allow shorter responses. I don't love this because it adds hidden hardcoded behavior based on offset=0.

Alternative: add another boolean parameter to NewRangeReaderEtag that's like allowShortResponse bool to make this behavior explicit. Thoughts?

msbarry commented 6 months ago

It looks like HTTP/gocloud buckets don't care if your byte range request goes past the end of the file? To match that behavior, maybe we could just remove the && offset == 0 check?

Also make sure you update mockBucket, and you can remove the extra logic I added to fakeArchive to pad if <16kb.

bdon commented 6 months ago

To match that behavior, maybe we could just remove the && offset == 0 check?

It should be OK for files, however on HTTP we need to special-case it - right now it's caught by isRefreshRequiredCode

If we are treating ETags correctly and the server correctly implements conditional requests we should never see a 416, only 412s when etags are actually expired. The only case where we see a 416 is the <16kb case, and we don't even get the N successfully read bytes. So our HTTP bucket should probably detect 416 and retry. A bit ugly...

msbarry commented 6 months ago

I didn't think it was an etag thing, I meant that HTTP server range requests return the overlap of the requested range and the actual data range. I've tried on a few different HTTP servers and when you request from within the file past the end of it, it always returns a 206. 416 is only when the start of the range is > the file length. Are you saying that caddy returns that error when backed by an HTTP bucket or file bucket?

The 416 error code spec indicates that it should only be thrown when none of the ranges "overlap the current extent"

bdon commented 6 months ago

@msbarry ok, updated