w3c / IndexedDB

Indexed Database API
https://w3c.github.io/IndexedDB/
Other
240 stars 62 forks source link

Expose IndexedDB IDBObjectStore.purge() method #420

Closed WebReflection closed 3 months ago

WebReflection commented 3 months ago

coming from https://github.com/whatwg/storage/issues/174


What problem are you trying to solve?

There is Desktop and there is Mobile, then there is IndexedDB which works in both but it keeps adding over and over and over KB to each store and there's no way to cleanup space without forcing a version change.

A version change can only move up, it's persistent on the IndexedDB layer but it's unknown at the user level.

This makes it a catch-22 like situation where a site needs IndexedDB to store a version to orchestrate a purge (deleteObjectStore before createObjectStore and re-add all items) and behind the scene, apparently, there's LevelDB compact() that would work for this task but nowhere is exposed in the wild.

The forever growing nature of stores in IndexedDB is an issue and it leads to errors due quota, where you can just save two integers as both key and value and see it failing at some point once you have set a quota limit on devtools so that it's both unpractical and unpredictable when it fails at compacting itself, but it's amazing the only storage Web developers are supposed to use has such convoluted API to give them control even over a cursor but there's no way whatsoever to actually free the previously used space or get rid of hidden, kept alive via LevelDB, data that nobody can reach or need out there.

This feature request is to provide a way to tell LevelDB to compact its data and trash stores that can't be reached anymore either due version change or due update of the correlated data per key.

Thank you for making sense of this surely powerful API, yet one that lacks the most important bit to scale: allow more complex sites to contain the amount of storage needed by each user.

What solutions exist today?

None. There are workarounds that don't really work and API limitations around the versioning ... there's no way to guarantee that if my initial store via store.put(2, 1) took little Kb to exist, further store.put(2, 1) won't just keep bloating amount of space needed by that store that still contains, and can return, only two bytes of data.

How would you solve it?

I would expose somehow LevelDB.compact() as that's the only solution that apparently work in NodeJS (not on the Web).

I don't even know if all vendors are using LevelDB (it'd be funny, as WebSQL died to not be dependent on SQLite and if that's the case here everyone is dependent on LevelDB logic?!) but a store.purge() operation within a transaction that fulfill once it's successful and actually trashes all unnecessary reclaimed data to keep alive in users disks something not readable anymore forever, would solve the issue.

Will it be slow? Fine, let applications and developers decide when to use it, so it's backward compatible but actually it will make everyone happier once available across browsers.

Anything else?

We all advocate on the Web for less RAM, less CPU usage, less bandwidth, less stress for everyone's device, and this standard came out with a forever growing, impossible to tackle, glitch around space used on physical Hardware that surely doesn't play well with the million of Websites that would like to use it and the million of users that would see their disk filled with unreachable data because all sites use IndexedDB and no site ever reclaim previous wasted space in the name of performance ... so thanks for hopefully think about any possible solution or API that won't need a version change and an upgrade transaction to drop and recreate a storage, or one that simply reclaim, or compact, the used data.

I work in the WASM enabling field and blobs out of foreing PLs are also the norm ... add one heavy blob package without checking it was there before identical and see your disk quota growing per each visit ... this is not just annoying, it's breaking everyone expectations about how much can IndexedDB scale.

Thank you.

evanstade commented 3 months ago

So I don't think that we're likely to want to surface an API for IDB that deals with an implementation detail like the fact that LevelDB requires periodic compaction, and also websites should not need to call purge() to paper over a poor user agent implementation. But I am going to hazard a guess that you've encountered this issue on Chromium in particular. If so, can you file a bug report on crbug.com and fill out the template with repro steps etc, or just give as much detail as possible about the behavior you're seeing? Because when I write a test app that just writes a bunch of stuff to IDB repeatedly, the size doesn't necessarily behave "rationally" but does not grow without bound either.

I don't even know if all vendors are using LevelDB (it'd be funny, as WebSQL died to not be dependent on SQLite and if that's the case here everyone is dependent on LevelDB logic?!)

They aren't, and FWIW WebSQL was deprecated largely because arbitrary websites can't be trusted to inject arbitrary SQL. User agents do use SQLite for other things so there's already that dependency in terms of internal browser usage. In Chromium, IndexedDB may one day use SQLite as well.

nickchomey commented 3 months ago

FYI, chromium is actively working on changing leveldb to sqlite https://issues.chromium.org/issues/40253999

And improving ibd performance is a top priority issue in general, with lots of sub-tasks being tracked. https://issues.chromium.org/issues/40262766/dependencies

And, according to this old article, Firefox already uses sqlite. https://www.aaron-powell.com/posts/2012-10-05-indexeddb-storage

WebReflection commented 3 months ago

@evanstade

But I am going to hazard a guess that you've encountered this issue on Chromium in particular.

well, that makes Chrome/ium and Edge, where both expose the disk quota used by the IndexedDB, FF and WK don't show that detail so I wouldn't actually know.

when I write a test app that just writes a bunch of stuff to IDB repeatedly, the size doesn't necessarily behave "rationally" but does not grow without bound either.

If I put to a single object store a buffer with a string key then a string with still a string key and remove these after reading their stored value, I can refresh and see the quota growing each refresh, even if the db is technically empty every single time I start fresh the same page. Not sure this is a Linux thing only neither though, if you can't measure the same.

The thing is, I opened this in WHATWG, they told me to open it in here, now I'm being told to open a bug in Chromium ... well, good news is, nobody will likely fix this if they are already working to move to sqlite, so I rest my case, I agree that there should be no need to compact IDB but if that's inevitable in the name of perf, I would love to be able to shrink and optimize / compact it when needed, for long sessions, or other use cases.

FWIW WebSQL was deprecated largely because arbitrary websites can't be trusted to inject arbitrary SQL

not the story I knew and followed at that time ... I also provided workarounds based on WASM implementation of SQLite; it's a pity this DB fuels the world of software but not the Web, GMail has been using it forever without issues and I used it without issues in the past ... there are many things the Web can do badly to hurt people, killing features in the name of "trust" is really bad, imho, then again, I am sure nothing will change here, so that I need SQLite as WASM that store its whole blob as IndexedDB which in turns is based on SQLite ... how logical is that 😅

Anyway, closing this as it's clear nothing will happen to improve the current status of the API.