Tag-post counting and storage counter causes slow page loading times

phil-flip commented 8 months ago

Someone posted this before, but I was not able to find it anymore: Slow loading times with massive amounts of posts and tag suggestions. I have now ~600k posts in my szuru-instance and finally decided to look into, why it takes so long to serve pictures and individual posts. Turns out that the DB query for counting the posts of a tag takes ~7 seconds per post. And because it is running for every post on the Posts-Tab, it takes ages to load one page. Because this statistic is useless to me, I made an edit in posts.py. Instead of calling the counting-function, I changed “usages” to 0. Now the response times significantly improved by a long shot. Please consider removing this statistic as it really is bottle-necking the app. If it is really that important, then caching those values or manually keeping track in the tags-table would be reasonable solutions.

Another edit I made a while back on an older System, was to disable the file counting, as it was causing the home page to be stuck for 30+ Mins. (This is an approximation, as I have better stuff to do then sit there and wait for it to finish doing its thing but I did check every 10 mins or so. ^^') While the code has a caching function, it doesn't keep the value long enough as well as lose it after a restart of the app. I personally don't need it anymore, because the new ZFS-based NAS does some magic there to provide the storage-size faster, but it's still something that needs to be pointed out and addressed. Tho this is also editable, tho I did miss the statistic at the time and was happy to see it run fine, when moving the installation.

phil-flip commented 8 months ago

Alright. I took some time to fix the database issue and for me the best solution that I came up with is to make a separate table that has all the counts “pre-calculated”. While my original approach was valid, I still had issues, that every so often a function needed the tag count causing the whole operation to slow way down again. So after a little bit of research and 5 hours deep into this issue I found this lovely article, which goes into detail how PostgreSQL doesn't pre-count the total amount of entries of tables. Not quite what I needed, so modded the script a little to count all the tags, write them into a newly created table as well as create a DB function with triggers to keep it updated, so there was minimal code change necessary in the backend (I don't Python.)

CREATE TABLE tag_usage (
 tag_id   int PRIMARY KEY,
 usage    int DEFAULT 0
);
-- establish initial count
INSERT INTO tag_usage
  SELECT post_tag.tag_id AS tag_id, count(post_tag.post_id) AS usage
    FROM post_tag
    GROUP BY post_tag.tag_id;

-- Create function to update counts
CREATE OR REPLACE FUNCTION adjust_tag_usage()
RETURNS TRIGGER AS
'
 DECLARE
 BEGIN
 IF TG_OP = ''INSERT'' THEN
 EXECUTE ''UPDATE tag_usage set usage=usage +1 WHERE tag_id = '''''' || NEW.tag_id || '''''''';
 RETURN NEW;
  ELSIF TG_OP = ''DELETE'' THEN
 EXECUTE ''UPDATE tag_usage set usage=usage -1 WHERE tag_id = '''''' || OLD.tag_id || '''''''';
 RETURN OLD;
 END IF;
 END;
'
LANGUAGE 'plpgsql';
CREATE TRIGGER adjust_tag_usage BEFORE INSERT OR DELETE ON post_tag
  FOR EACH ROW EXECUTE PROCEDURE adjust_tag_usage();
COMMIT;

-- Create function to create and delete count entries
CREATE OR REPLACE FUNCTION update_tag_usage()
RETURNS TRIGGER AS
'
 DECLARE
 BEGIN
 IF TG_OP = ''INSERT'' THEN
 EXECUTE ''INSERT INTO tag_usage (tag_id, usage) VALUES ('''''' || NEW.id || '''''', 0)'';
 RETURN NEW;
  ELSIF TG_OP = ''DELETE'' THEN
 EXECUTE ''DELETE FROM tag_usage WHERE tag_id = '''''' || OLD.id || '''''''';
 RETURN OLD;
 END IF;
 END;
'
LANGUAGE 'plpgsql';
CREATE TRIGGER update_tag_usage BEFORE INSERT OR DELETE ON tag
  FOR EACH ROW EXECUTE PROCEDURE update_tag_usage();
COMMIT;

My apologies for the jankiness of the DB setup. But it works. (You might need to switch to "$$" but my SQL Editor of choice wanted single quotes.) Alongside the DB changes I also added the table to the DB model in tag.py-file:

class TagUsage(Base):
    __tablename__ = "tag_usage"

    tag_id = sa.Column(
        "tag_id",
        sa.Integer,
        nullable=False,
        primary_key=True,
    )
    usage = sa.Column(
        "usage",
        sa.Integer,
        default=0,
        nullable=True,
    )

    def __init__(self, tag_id: int, usage: int) -> None:
        self.tag_id = tag_id
        self.usage = usage

And ofc adjusted the count-function in the same file further down

    post_count = sa.orm.column_property(
        sa.sql.expression.select([TagUsage.usage])
        .where(TagUsage.tag_id == tag_id)
        .correlate_except(TagUsage)
    )

I would open up a PR, but I'm not experienced enough for python and don't know how to make a migration, triggers and DB-functions with sqlalchemy. I reverted my previous changes and map now that file in, and it works fine and szuru runs like a charm while also showing the tag-counts.

A fair warning: I only just started using this new modded version, so I don't know if there will be any issues with other functions I am not aware of at the time of writing. So in case there is anyone that has similar issues: Handle this with care and a grain of salt. But I will keep this open and updated, until there is proper support.

phil-flip commented 7 months ago

So I just noticed an issue when merging tags: It doesn't combine the numbers…so I might need to revisit the implementation of the count thingy in the future.

parallax4 commented 7 months ago

@neobooru What are your thoughts on this? Would be nice if something like this could be merged.

65947689564 commented 2 weeks ago

@phil-flip @neobooru It would be good if we could get something like this implemented.

rr- / szurubooru

Tag-post counting and storage counter causes slow page loading times #608