mozilla / application-services

Firefox Application Services
https://mozilla.github.io/application-services/
Other
610 stars 225 forks source link

history: Thoughts on reducing on-disk size #639

Closed thomcc closed 11 months ago

thomcc commented 5 years ago

This is the size breakdown of my places database imported from desktop using the autocomplete import script:

https://gist.github.com/thomcc/243776a4235eaf74c5a1af5d8eee2235

This is after VACUUMing. It's 70MBish and has this many records:

sqlite> select count(*) from moz_places;
162689
sqlite> select count(*) from moz_historyvisits;
253521
sqlite> select count(*) from moz_origins;
10358

It's applied on top of #632, because that's the branch I'm in. I don't expect that to make much of a difference.

It's worth noting that we have an ISLOCALINDEX which is almost entirely useless (See number 3 here: https://www.sqlite.org/queryplanner-ng.html#_checklist_for_avoiding_or_fixing_query_planner_problems -- "In particular, avoid using boolean or "enum" columns as the left-most columns of your indexes"). I think thisis my fault, I added the index before reading that document.

It's also possible that LASTVISITDATELOCALINDEX and LASTVISITDATEREMOTEINDEX, as well as VISITCOUNTLOCAL/VISITCOUNTREMOTE can be unified. A lot of our queries look like visit_count_local > 0 OR visit_count_remote > 0. We could instead add an expression index on visit_count_local + visit_count_remote and have a single (visit_count_local + visit_count_remote) > 0. The same might be possible with last_visit_date_local/remote.

┆Issue is synchronized with this Jira Task ┆Epic: Important backlog

bendk commented 2 years ago

@mhammond and I have been discussing updating the places::run_maintenance() function to do more to reduce disk space by pruning unneeded data. Maybe we should also pick this one back up too and try to generally reduce disk space.

Here's the results from me running sqlite_analyzer on my database, which is newer and smaller than thomcc's. It seems somewhat similar to thomcc's data, although my MOZ_PLACES used a much smaller percentage of the total space than his. BTW, it's pretty awesome that the report from this issue is featured on sqlite.org (https://www.sqlite.org/sqlanalyze.html).

From a quick analysis it seems that thomcc's recommendations still make sense and we could save significant data by removing some indexes.

Another thing I noticed was that the title and url fields on moz_places take a significant amount of the total data (about 15%), are highly redundant, and aren't used in any keys. I wonder if there could be some way store this data in a compressed manor.

mhammond commented 2 years ago

@mhammond and I have been discussing updating the places::run_maintenance() function to do more to reduce disk space by pruning unneeded data. Maybe we should also pick this one back up too and try to generally reduce disk space

Yeah - see also my other comment re PlacesExpiration.jsm in another issue (which I've now closed and deleted the github mail, so - lots of issues FTW? :)

Another thing I noticed was that the title and url fields on moz_places take a significant amount of the total data (about 15%), are highly redundant, and aren't used in any keys. I wonder if there could be some way store this data in a compressed manor.

That seems reasonable, but it's not clear what you have in mind. url is somewhat special in that the url_hash is indexed, and all queries (should!) first check the hash then the string itself. title is used by searches IIRC - what did you have in mind there?

TBH though, I do think a first good step would be to stop it growing without bounds, and later we could look at these kinds of optimizations - although we'd probably run that past the desktop places team who have thought a heap about some of this stuff and have a super-close relationship with the primary sqlite author (which is where that report above came from IIUC)

mhammond commented 11 months ago

Moved to bugzilla: https://bugzilla.mozilla.org/show_bug.cgi?id=1864900

Change performed by the Move to Bugzilla add-on.