haystack / tipsy

A new project to encourage pay-what-you-want support for any web site.
http://tipsy.csail.mit.edu/
MIT License
31 stars 9 forks source link

Tipsy stops logging when local storage gets full #100

Open karger opened 7 years ago

karger commented 7 years ago

My installation gave up logging after about a year of usage; it turns out this was enough to fill up chrome.storage.local which proceeds to reject any further storage. Potential fixes include a move to indexDB or some kind of garbage collection (which we may need even if we move to indexDB). For a given site, we could keep only the last few urls along with a total time spent (paid for and not-yet paid for).

Stebalien commented 7 years ago

move to indexDB

I'm not sure you can from an addon.

Anyways, for future reference, the restrictions are here: https://developer.chrome.com/extensions/storage#property-local. TL;DR: You can store 5MiB. If you want to store more, you can request the unlimitedStorage permission but that's not a good fix (especially considering the fact that tipsy currently reads/writes the entire database from/to storage as one big blob).

tbranyen commented 7 years ago

I have two suggestions varying in robustness:

1) Easiest: Warn the user the storage is full and offer to export to JSON (and purging internally). I don't think we need to worry too much about loading 5MB of JSON (if we enabled unlimited, I would image that would be an issue per @Stebalien's comment). I haven't looked into how the app runs under max load (w/ 5MB JSON). @karger do you find it that it loads and scrolls suitably?

2) More involved solution: Switch to a more suitable client side database like indexeddb via an abstraction library paired with virtual scrolling to show all results. I've written such a library, HyperList that can reliably handle hundreds of thousands of rows.

Additional thoughts: We could load a reasonable chunk up front from the DB and then using the browser's idle callback to load more into memory. Then if they scroll to the bottom they can be presented an option to load all or something.

Short-term I'd say we implement option 1, and then in a separate branch flesh out ideas for migrating to a more scalable approach. Thoughts @karger @Stebalien?

karger commented 7 years ago

A fundamental question is how important it is to preserve a complete browsing history. I know that complete history was one of the main appeals for Tim :) but it isn't necessary for Tipsy's core functionality. At the other extreme, keeping only tipsy-enabled sites around would immediately solve the problem unless we get lots more publishers signed up. As a middle ground, I think it took me upwards of a year to fill storage, suggesting we could just summarize (as total visit time) and expire old entries and still expect a year's worth of history, which would seem plenty for the key situation of a site suddenly enabling tipsy and the user wanting to inspect visits before they pay.

If we pursue option 1, I'm nervous about warn and export as an interaction for novice users. And we still need to do something if they decline to export. Wouldn't deleting old be the most reasonable default? We could modify the data structure to let us access stuff in order of age for deletion, or we could just use a binary search to find the median-date entry and delete everything older than that.

I've not noticed problems with interaction with the visit log despite the 5Mb size.

On 2/11/2017 10:00 PM, Tim Branyen wrote:

I have two suggestions varying in robustness:

1.

Easiest: Warn the user the storage is full and offer to export to
JSON (and purging internally). I don't think we need to worry too
much about loading 5MB of JSON (if we enabled unlimited, I would
image that would be an issue per @Stebalien
<https://github.com/Stebalien>'s comment). I haven't looked into
how the app runs under max load (w/ 5MB JSON). @karger
<https://github.com/karger> do you find it that it loads and
scrolls suitably?

2.

More involved solution: Switch to a more suitable client side
database like indexeddb via an abstraction library paired with
virtual scrolling to show all results. I've written such a
library, HyperList <https://github.com/tbranyen/hyperlist> that
can reliably handle hundreds of thousands of rows.

Additional thoughts: We could load a reasonable chunk up front from the DB and then using the browser's idle callback https://developer.mozilla.org/en-US/docs/Web/API/Window/requestIdleCallback to load more into memory. Then if they scroll to the bottom they can be presented an option to load all or something.

Short-term I'd say we implement option 1, and then in a separate branch flesh out ideas for migrating to a more scalable approach. Thoughts @karger https://github.com/karger @Stebalien https://github.com/Stebalien?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haystack/tipsy/issues/100#issuecomment-279193117, or mute the thread https://github.com/notifications/unsubscribe-auth/ABFpXi4bkYowv28h5BVo6nDIDDhxLAdNks5rbnXogaJpZM4L-WxI.

tbranyen commented 7 years ago

Ah shoot, missed the part about not being able to use indexeddb (woosh), I'll investigate alternative solutions.

Stebalien commented 7 years ago

Actually, scratch that. You may be able to use indexedDB. I assumed that addons would have to use the chrome addon storage API but it appears that they can use normal HTML5 storage mechanisms as well.

Stebalien commented 7 years ago

FYI, I'm using localForage in list.it. It supports multiple backends including localStorage and indexedDB and exposes an API similar to Chrome's storage API.

tbranyen commented 7 years ago

@Stebalien nice, localForage is exactly the kind of abstraction I was hoping would work! Although we need to do more than just switch to indexeddb per my reply, otherwise it'd achieve the same effect as just enabling unlimited quota w/ chrome.storage (loading all data upfront at once).

karger commented 7 years ago

I've dug through my storage and, assuming tipsy records its accessTime field in milliseconds, it looks like it took tipsy 139 days to fill up. I haven't cleared storage yet in case we want to look through it, but I'd like to in order to resume using/testing/demoing tipsy. So let me know if there's a reason to keep it.

da2x commented 7 years ago

Until a better solution is found, it seems requesting unlimitedStorage will prevent the extension from breaking for your most dedicated users.

tbranyen commented 7 years ago

I remember now why we used chrome.storage. It syncs across your browser sessions and devices, which I think is worthwhile to keep.

Stebalien commented 7 years ago

@tbranyen Only chrome.storage.sync does that and tipsy currently doesn't use it (except in some test cases). Also note that the synced storage isn't affected by the unlimitedStorage permission and has a very small quota (https://developer.chrome.com/apps/storage#property-sync).

tbranyen commented 7 years ago

@Stebalien I don't know now, but at the time I wrote the storage adapter I did account for sync: https://github.com/haystack/tipsy/blob/9219973a35c9c011acff4ef768c235312e4fe804/shared/scripts/lib/storage.js#L28

Do you mean that this code path is only ever hit in tests and not in actual usage? It does look like sync has a significantly smaller quota so probably not all that useful for this case.

Stebalien commented 7 years ago

Do you mean that this code path is only ever hit in tests and not in actual usage?

Exactly.

rht commented 7 years ago

+1 for localforage: it defaults to indexedDB when available, http://caniuse.com/#search=indexeddb, has quirks from various browsers already taken care of, and with api that is almost like localstorage, but async. This library was implemented a long time after @tbranyen wrote the storage adapter, I think.