Closed teolemon closed 1 week ago
On off1, the current PG db size is 89GB uncompressed, 28GB compressed by ZFS.
Half of the DB storage comes from indices (44GB).
Storage space can be saved by:
I did a quick test on the product_state_tag table (the largest one):
uuid (16 bytes) vs PG sequences (8 bytes) make indices larger by 1GB
Regarding query execution time:
select value, count(*) from product_states_tag group by 1;
takes 15.8s
Its equivalent with values stored in a separate table takes 12.5s
select value, nb from (select tid, count(*) as nb from test_state_tag group by 1) t join test_state_tags v on (id=tid);
Thanks @cquest. I am currently looking into loading taxonomies into off-query to allow us to join to these to get translations, which would avoid the need for off-server to translate the tags after performing the query.
The main issue with this, however, is that many products contain tags that don't have corresponding taxonomy entries, so we would need to create missing tags on-the-fly. Not difficult, but could impact loading speed.
Another thing I have considered is moving the JSON data into a separate "staging" table separate from the relational model. That way we would be able to periodically delete this data. I'm also wondering whether having the large JSON column in a separate table might improve query performance on the relational fields
Sorry, 2 more questions:
For the SERIAL sequences are 4 bytes, which should be enough (billions).
Regarding large numbers of tags like editors, some cleaning maybe be use on the yuka "editors" to merge them if this is not an issue for the facets.
I think after this I will move the tags that have related taxonomies to fetch values from a separate table (which will ultimately become the full taxonomy table). I may leave the values that are more ad-hoc (like contributors) as they are.
Question