oursky / pageship

https://oursky.github.io/pageship/
Apache License 2.0
1 stars 3 forks source link

Serve cached compressed content #13

Open kiootic opened 9 months ago

kiootic commented 9 months ago

Serve gziped/brotli compressed content, with fixed-size in-memory cache.

kiootic commented 9 months ago
toshinari123 commented 9 months ago
func NewIMC(size int) {*IMC, error)
func (cache *IMC) Get(id string) (value, error)
func (cache *IMC) Add(id string, value Value) (error)

the value is the gzipped content and the key is the static file path, it would also use cache busting and append the SHA256 hashed file to the path. For the cache algorithm, it would be a simple least recently used cache.

kiootic commented 9 months ago
toshinari123 commented 9 months ago

the scope of this cache is per-server because the fixed size of the cache should depend on the machine (if there are multiple apps and it is per-app the effective total size is multiplied). If implementing from scratch, the cache would consist of a list of keys (ordered in access time) and a key value map. There are also libraries such as https://github.com/hashicorp/golang-lru (simple) and https://github.com/dgraph-io/ristretto (more complex but seems better performance)

toshinari123 commented 9 months ago

I think ristretto is a good choice as it is in-memory and memory bounded. The type of value of cache is bytes.Buffer. The content cache will be added to Handler struct in line 31 of internal/handler/site/handler.go and the cache will be created in func NewHandler. The cache accessing will occur in line 74 if block of internal/handler/site/site_handler.go. if content exist in the cache, lines 86 to 87 will be used to serve the file (reader is replaced with bytes.Reader from the cache). if it does not exist, create a new buffer and use bytes.Buffer as the writer and line 79 to 83 lazyReader as reader. source

kiootic commented 9 months ago

The placement of cache sounds good to me. I'd like consider more on the cache logic:

toshinari123 commented 9 months ago
toshinari123 commented 9 months ago

*typo: it will be io.WriteCloser and directly use gzip.Writer or brotli.Writer

kiootic commented 9 months ago

concurrent cache access is handled by ristretto

Ristretto supports concurrent get & set, but how about the initial cache content? Suppose there are 100 requests for the same uncached file, we'd like to compress the file only once, and the 100 requests should wait on the compression to complete before serving the same compressed content.

there will be a check if file exceed the max cache size, and if it does it will just be read from file every time (never goes into cache)

Does it mean it would never be compressed, or compress from scratch every time the file is requested?

the max cache size config will be added to HandlerConfig

The user should be able to configure using the usual way of configuration (command line flags or environment variables).

You may want to try setup the managed-sites mode for general deployment setup

toshinari123 commented 9 months ago

what about this: make a map of mutexes with the hash as key; on cache miss, it will

  1. lock the corresponding mutex
  2. check if exist in cache
  3. if not, compress and put into cache
  4. unlock mutex

If there are a lot of request for the same file, the first request will trigger a compression and putting into cache, then after unlocking the other request will see the compressed file in cache so it would not need to compress again

if file exceed the max cache size, it would never be compressed. it is most likely some media file that has underwent compression according to the file format already.

kiootic commented 9 months ago

if file exceed the max cache size, it would never be compressed. it is most likely some media file that has underwent compression according to the file format already.

Sounds good to me!

If there are a lot of request for the same file, the first request will trigger a compression and putting into cache, then after unlocking the other request will see the compressed file in cache so it would not need to compress again

If we have an additional map of mutexs, we need to be careful to ensure each mutex has the same lifetime as the actual value. That is, when the value is evicted from cache (maybe due to LRU policy), the mutex is also deleted from the map.

toshinari123 commented 9 months ago

content cache

toshinari123 commented 9 months ago
toshinari123 commented 9 months ago

should i implement compression and contentcache as middlewares instead?

kiootic commented 9 months ago

That sounds good to me. However please note that the cache and the logic that handles caching should be separated, so the cache middleware would add the compressed content to the cache if needed (e.g. according to threshold).

toshinari123 commented 9 months ago

above: done 1, 2, 4, taken 3 hours

kiootic commented 9 months ago

If file size > threshold (about >1M), we may assume it is a compressed asset (e.g. PNG/ZIP). So we just check the raw file size against the threshold before trying to compress & add to cache.

TODO:

  1. Write tests for #15
  2. Refactor into a cache middleware
  3. Add a compression middleware
  4. Test the middlewares works correctly
  5. Add configuration flags

Expected work order: compress -> cache, so need pay attention to the order of applying middleware.

toshinari123 commented 8 months ago
kiootic commented 8 months ago
type CacheResponse struct {
    Data []byte
    Headers map[string]string
    // Content-Encoding/Etag/Cache-Control
}

func CacheMiddleware(request, response, next) {
    // 1. Lookup from cache
    // 1.1. Create cache key (file hash, encoding (compression method))
    // ref: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Vary
    // can lookup libraries providing this cache mechanism?
    key := file.Hash
    if request.Header("Accept-Encoding") == "???" {
        key += "???"
    }

    cachedResponse, ok := cache.Lookup(key)
    if ok {
        response.WriteHeaders(cachedResponse.Headers)
        response.Write(cachedResponse.Data)
        return
    }

    next.ServeHTTP(request, response)

    data := response.Data
    headers := response.Headers
    cache.Set(key, CacheResponse {Data, Headers})
}
kiootic commented 8 months ago

TODO:

  1. Lookup libraries providing this cache mechanism? (4)
  2. Write cache middleware (maybe using library) (2)
  3. Write test for the middleware (2)
toshinari123 commented 8 months ago

https://github.com/bxcodec/httpcache : uses transport and roundtripper https://github.com/gregjones/httpcache/tree/master : uses transport and roundtripper https://github.com/kataras/httpcache/blob/v0.0.1/service.go (predecessor of cache in Iris web framework) : low number of users https://github.com/victorspringer/http-cache/blob/747df1b7981c68d7218c4e2c6a1a5cf13fd0cccd/cache.go : similar structure, but uses a different way to make the cache key (hash the URL)

conclusion: code the cache while referring to code of last 2 caches