Open phil-flip opened 8 months ago
Alright. I took some time to fix the database issue and for me the best solution that I came up with is to make a separate table that has all the counts “pre-calculated”. While my original approach was valid, I still had issues, that every so often a function needed the tag count causing the whole operation to slow way down again. So after a little bit of research and 5 hours deep into this issue I found this lovely article, which goes into detail how PostgreSQL doesn't pre-count the total amount of entries of tables. Not quite what I needed, so modded the script a little to count all the tags, write them into a newly created table as well as create a DB function with triggers to keep it updated, so there was minimal code change necessary in the backend (I don't Python.)
CREATE TABLE tag_usage (
tag_id int PRIMARY KEY,
usage int DEFAULT 0
);
-- establish initial count
INSERT INTO tag_usage
SELECT post_tag.tag_id AS tag_id, count(post_tag.post_id) AS usage
FROM post_tag
GROUP BY post_tag.tag_id;
-- Create function to update counts
CREATE OR REPLACE FUNCTION adjust_tag_usage()
RETURNS TRIGGER AS
'
DECLARE
BEGIN
IF TG_OP = ''INSERT'' THEN
EXECUTE ''UPDATE tag_usage set usage=usage +1 WHERE tag_id = '''''' || NEW.tag_id || '''''''';
RETURN NEW;
ELSIF TG_OP = ''DELETE'' THEN
EXECUTE ''UPDATE tag_usage set usage=usage -1 WHERE tag_id = '''''' || OLD.tag_id || '''''''';
RETURN OLD;
END IF;
END;
'
LANGUAGE 'plpgsql';
CREATE TRIGGER adjust_tag_usage BEFORE INSERT OR DELETE ON post_tag
FOR EACH ROW EXECUTE PROCEDURE adjust_tag_usage();
COMMIT;
-- Create function to create and delete count entries
CREATE OR REPLACE FUNCTION update_tag_usage()
RETURNS TRIGGER AS
'
DECLARE
BEGIN
IF TG_OP = ''INSERT'' THEN
EXECUTE ''INSERT INTO tag_usage (tag_id, usage) VALUES ('''''' || NEW.id || '''''', 0)'';
RETURN NEW;
ELSIF TG_OP = ''DELETE'' THEN
EXECUTE ''DELETE FROM tag_usage WHERE tag_id = '''''' || OLD.id || '''''''';
RETURN OLD;
END IF;
END;
'
LANGUAGE 'plpgsql';
CREATE TRIGGER update_tag_usage BEFORE INSERT OR DELETE ON tag
FOR EACH ROW EXECUTE PROCEDURE update_tag_usage();
COMMIT;
My apologies for the jankiness of the DB setup. But it works. (You might need to switch to "$$" but my SQL Editor of choice wanted single quotes.)
Alongside the DB changes I also added the table to the DB model in tag.py
-file:
class TagUsage(Base):
__tablename__ = "tag_usage"
tag_id = sa.Column(
"tag_id",
sa.Integer,
nullable=False,
primary_key=True,
)
usage = sa.Column(
"usage",
sa.Integer,
default=0,
nullable=True,
)
def __init__(self, tag_id: int, usage: int) -> None:
self.tag_id = tag_id
self.usage = usage
And ofc adjusted the count-function in the same file further down
post_count = sa.orm.column_property(
sa.sql.expression.select([TagUsage.usage])
.where(TagUsage.tag_id == tag_id)
.correlate_except(TagUsage)
)
I would open up a PR, but I'm not experienced enough for python and don't know how to make a migration, triggers and DB-functions with sqlalchemy. I reverted my previous changes and map now that file in, and it works fine and szuru runs like a charm while also showing the tag-counts.
A fair warning: I only just started using this new modded version, so I don't know if there will be any issues with other functions I am not aware of at the time of writing. So in case there is anyone that has similar issues: Handle this with care and a grain of salt. But I will keep this open and updated, until there is proper support.
So I just noticed an issue when merging tags: It doesn't combine the numbers…so I might need to revisit the implementation of the count thingy in the future.
@neobooru What are your thoughts on this? Would be nice if something like this could be merged.
@phil-flip @neobooru It would be good if we could get something like this implemented.
Someone posted this before, but I was not able to find it anymore: Slow loading times with massive amounts of posts and tag suggestions. I have now ~600k posts in my szuru-instance and finally decided to look into, why it takes so long to serve pictures and individual posts. Turns out that the DB query for counting the posts of a tag takes ~7 seconds per post. And because it is running for every post on the Posts-Tab, it takes ages to load one page. Because this statistic is useless to me, I made an edit in
posts.py
. Instead of calling the counting-function, I changed “usages” to 0. Now the response times significantly improved by a long shot. Please consider removing this statistic as it really is bottle-necking the app. If it is really that important, then caching those values or manually keeping track in the tags-table would be reasonable solutions.Another edit I made a while back on an older System, was to disable the file counting, as it was causing the home page to be stuck for 30+ Mins. (This is an approximation, as I have better stuff to do then sit there and wait for it to finish doing its thing but I did check every 10 mins or so. ^^') While the code has a caching function, it doesn't keep the value long enough as well as lose it after a restart of the app. I personally don't need it anymore, because the new ZFS-based NAS does some magic there to provide the storage-size faster, but it's still something that needs to be pointed out and addressed. Tho this is also editable, tho I did miss the statistic at the time and was happy to see it run fine, when moving the installation.