Open annevk opened 4 years ago
This seems desirable and has indeed come up before. Specifically, in terms of allowing structured serialized storage of data on things like ServiceWorker registrations and related data (ex: Notification.data) where it would be desirable to place an upper bound on storage but is an interop nightmare without this issue addressed.
I believe this would require the serialization steps for [Serializable]
to also produce a size/upper-bound value as well?
It seems like the most complex issues are;
Thank you very much for opening a specific issue for this topic!
Reiterating here for clarity -- Chrome is supportive of this effort to come up with an abstract cost model for storage. We'd be willing to take on the (quite non-trivial) implementation costs if the model gains cross-browser acceptance.
I also really like that @asutherland brought up some of the complex issues early on. I'd be tempted to follow the solutions of other systems I'm aware of.
Blobs: Charge a separate copy per item. I claim this approach is more intuitive to users -- you're charged for what you write, with decisions made locally. Implementers get the benefits from content de-duplication as operational cost reduction. I think this approach would also make the proposal more palatable, because we'd be avoiding asking browsers to implement content de-duplication to be compliant.
Compression: Charge for uncompressed data. Same reasoning as above -- it's more intuitive to be charged for what you write. Also, unless we mandate that each object is compressed individually, compression ratios depend on adjacent data, so I think we'd end up with a lot of constraints around physical data layout. I'd strongly prefer that specs don't get into this business :smile:
On a brighter note, the zstd benchmarks suggest that the algorithms we'd consider have ratios within 2x of each other (and below 3x of uncompressed) for "typical" data. I claim this is well within the precision margin for the cost model we'd be building up here.
Along the same lines, I hope that we can avoid having apps play games (like manual compression) by being reasonably generous with quota. Ideally, apps without bugs should not run into quota problems.
I found some notes from when I tried to sketch a storage cost model for IndexedDB. This was in 2018, and I knew a lot less about the implementation back then. So, the numbers are probably bad, but at least it's a list of things to consider.
Object cost:
I might have missed some other object. The idea is to assign a cost based on a straightforward representation for each clonable. The cost doesn't have to be exact, because we expect implementations to have their own overhead.
IndexedDB transaction costs (get refunded when the transaction completes):
This isn't a complete list. I hope it's a good starting point if someone is itching to start an explainer :smile:
@pwnall Your simplifying proposal in https://github.com/whatwg/storage/issues/110#issuecomment-662493325 sounds good to me. Also, it's very consistent with reality, as Mozilla's Servo project is an example of bringing up a browser from scratch-ish and they've found implementing IndexedDB non-trivial, so further complicating the standard and raising the bar to building a compliant browser engine would not be a win for the web.
See also: https://github.com/whatwg/html/issues/4914.
In order to give developers a more consistent experience across browsers, while allowing browsers to compress, deduplicate, and otherwise optimize the stored data, we should standardize the upper bound for each storage action and have all browsers enforce that.
E.g., the size of
localStorage[key] = value
could be (key's code unit length + value's code unit length) × 2 + 16 bytes of safety padding or some such. (I did not put a lot of thought into this. If we go down this path we'd need to do that.)(See 6 in https://github.com/whatwg/storage/issues/95#issuecomment-656555686 and reply for context.)