storj / roadmap

Storj Public Roadmap
Other
10 stars 4 forks source link

Better Performance for "Hot" Files #11

Closed iglesiasbrandon closed 1 month ago

iglesiasbrandon commented 2 years ago

Summary

We know there are high usage files that may benefit from something like caching of objects in Gateway-MT and Link sharing. The scenario is one where there is a set of (usually small) files that are requested in a short amount of time. Some examples include a linksharing file that was shared via a popular forum, or video files accessed via gateway-mt after release of a popular sporting event.

Originally this roadmap item was titled: "Gateway MT/Linksharing Object Caching." While we still think that caching may be all or part of the solution, we want to focus the roadmap item on the customer pain point, rather than the solution.

Pain Point:

The question we are trying to answer is how to scale "hot file" downloads specifically for the Edge services. While a typical "hot file" in the Storj network could be scaled by altering the number of erasure encoded pieces (Whitepaper section 6.2), the Edge services are unfortunately a centralized point of failure.

Currently, the Storj DCS infrastructure is not highly responsive to dynamic load changes in the Edge services. The Edge Services are also the most likely place that hot file load will be seen, due to their public nature. Thus the Edge service may need centralized scaling, likely in the form of a local persistence mechanism (such as files on disk), AKA caching.

Intended Outcome:

Any outcome which enables the Edge service to gracefully deliver "hot files" beyond their intended network capacity should be considered.

How will it work?

Many things are not yet determined about how it will work. How will we capture billing information? How will we know if / when to invalidate a cache and/or detect changes on the satellite? Will all files be eligible, or will this be as setting? Do we need to actively detect "hot files" or can we cache "everything"? Do we have appropriate hardware for these workflows? Will SNOs be compensated in any way? How will range queries be billed [per byte? per segment]? Can we leverage an off-the-shelf caching layer, such as Squid? Will our current infrastructure support adequate cache size?

Links:

storj/gateway-mt#75 https://github.com/minio/minio/blob/master/docs/disk-caching/DESIGN.md Whitepaper: https://www.storj.io/storjv3.pdf Gateway-MT Milestone: https://github.com/storj/gateway-mt/milestone/2

wthorp commented 2 years ago

Summary: We know there are high usage files which may benefit from caching of objects in Gateway-MT and Linksharing. They have a few files which are requested in bursty traffic patterns. However, caching files has complications.

Pain Point: In general, object caching doesn't a known, active paint point, but a notional one. The question it solves is how to scale "hot file" downloads specifically for the Edge services. While a typical "hot file" in the Storj network could be scaled by altering the number of erasure encoded pieces (Whitepaper section 6.2), the Edge services are unfortunately a centralized point of failure. Currently the Storj DCS infrastructure is not highly responsive to dynamic load changes in the Edge services due to economic factors. The Edge Services are also the most likely place that hot file load will be seen, due to their public nature. Thus the Edge service may need centralized scaling, likely in the form of a local persistence mechanism (such as files on disk), AKA caching.

Intended Outcome: Any outcome which enables the Edge service to gracefully deliver "hot files" beyond their intended network capacity should be considered.

How will it work? Many things are not yet determined about how it will work. How will we capture billing information? How will we know if / when to invalidate a cache and/or detect changes on the satellite? Will all files be eligible, or will this be as setting? Do we need to actively detect "hot files" or can we cache "everything"? Do we have appropriate hardware for these workflows? Will SNOs be compensated in any way? How will range queries be billed [per byte? per segment]? Can we leverage an off-the-shelf caching layer, such as Squid? Will our current infrastructure support adequate cache size?

Links: https://github.com/storj/gateway-mt/issues/75 https://github.com/minio/minio/blob/master/docs/disk-caching/DESIGN.md https://www.storj.io/storjv3.pdf

elek commented 1 year ago

We know there are high usage files which may benefit from caching of objects in Gateway-MT and Linksharing.

Just curious: How do you know that we have such files where we hit the limit of the storagenodes.

Do we have any data? Parallel request to the same segment per storage node?

I follow the stat of my storagenodes and didn't really see high bursts...

D4rk4 commented 1 year ago

Check Varnish instead of Squid.

iglesiasbrandon commented 1 month ago

Closing this issue for now, we have other solutions for hot files.