nginxinc / nginx-s3-gateway

NGINX S3 Caching Gateway
Apache License 2.0
515 stars 127 forks source link

Cache byte range requests #215

Closed 4141done closed 7 months ago

4141done commented 8 months ago

What

A potential fix for #188

When the Range header is supplied:

When the Range header is not supplied:

I think it's good to have the ability to serve and cache byte range requests efficiently by default. Although we could turn this on and off with a config option, the overhead is low and makes the gateway more flexible.

Implementation Details

Open questions:

Examples

Normal Request

curl -o foo.txt localhost:8989/a/5mb.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 5120k  100 5120k    0     0   111M      0 --:--:-- --:--:-- --:--:--  113M

A single cache file is created

root@f339daeb2d44:/var/cache/nginx/s3_proxy# tree .
.
`-- 5
    `-- 9e
        `-- 447b5a707c18a4c0e90344925e6b39e5

The size of the cache file is equal to the requested file:

root@f339daeb2d44:/var/cache/nginx/s3_proxy# du -h .
5.1M    ./5/9e
5.1M    ./5
5.1M    .

Byte Range Request

In this example, I'm requesting a 5mb file, and the PROXY_CACHE_SLICE_SIZE option has been set to 1000k (1000 kilobytes)

curl -o foo.txt -r 1000000-4000000 localhost:8989/a/5mb.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2929k  100 2929k    0     0  66.8M      0 --:--:-- --:--:-- --:--:-- 68.1M

Cache files are created in chunks:

root@f339daeb2d44:/var/cache/nginx/s3_proxy_slices# tree .
.
|-- 0
|   `-- 5c
|       `-- 18f94c01f7a1beed3afe0aa92baf05c0
|-- 4
|   `-- 30
|       `-- 9fac913edc79622fdcc2975d91e4f304
|-- b
|   `-- 5b
|       `-- 91bfb9ef86136be4b07cdc2eb51025bb
`-- d
    `-- 82
        `-- 339384e3e9840cf7f8fe4e54fdc8182d

The size of each cache file is roughly equal to the requested file the chunk size:

root@f339daeb2d44:/var/cache/nginx/s3_proxy_slices# du -h .
1008K   ./d/82
1012K   ./d
1008K   ./0/5c
1012K   ./0
1008K   ./b/5b
1012K   ./b
1008K   ./4/30
1012K   ./4
4.0M    .
4141done commented 8 months ago

/dev_build

4141done commented 8 months ago

/build_dev

github-actions[bot] commented 8 months ago

Build and Push Dev Preview Image The long task is done!

You can find the workflow here: https://github.com/nginxinc/nginx-s3-gateway/actions/runs/8058196483

alessfg commented 8 months ago

The general gist of the PR looks good to me! Some thoughts re your open ended questions --

Do we want to be able to control the slice cache separately from the main cache?

I see the sliced cache as an additional feature of the main cache so as a user, I don't think you'd need to control the sliced cache separately. In so far as an NGINX implementation detail goes, a separate cache might help fine tune the necessary slice cache settings (but I would still consider using the same cache until such a point comes).

Do we need to disable proxy_cache_lock in the slice-cached handler?

My go to would be to use a default value at which proxy_cache_lock gets enabled for both normal and sliced requests, yet allow the user to tweak that value.

Do we necessarily need to separate the cache storage on disk?

I think this should be the default. I can see slicing back together the data on disk being useful for some use cases, but doing so should be an optional toggle.

4141done commented 8 months ago

Sorry @alessfg for the thumbs down. I have a rampant bot that I had to fix. Thank you for your comments, I'm take them into account as we work on this feature more.