Timeout on pools API with lots of pools.

rr- / szurubooru

Image board engine, Danbooru-style.

GNU General Public License v3.0

718 stars 182 forks source link

Hey all,

I have a szurubooru instance with ~100k pools, and found that the pools API would always timeout, and additionally, it seems it'd cause the rest of szurubooru's functionality to "lock up" until the server container was restarted.

I've tried investigating the issue and have found that this function https://github.com/rr-/szurubooru/blob/7a82e9d5813d8b88e2f49ebdabbf19957b2f393a/server/szurubooru/search/configs/pool_search_config.py#L32-L33 seems to at least play some role in the issue.

Modifying this function such that it is

def finalize_query(self, query: SaQuery) -> SaQuery:
    return query

seems to at least make the issue partially go away, which indicates to me there's some issue with the sorting on first_name for pools, and this is confirmed since even with this "fix", trying to sort by name for pools makes the issue come back.

I am not familiar enough with szurubooru's codebase or SqlAlchemy to try and identify the core issue, but I felt it was worth creating this issue in case anyone else more familiar with szurubooru's codebase can find the exact reason why sorting on Pool.first_name seems to cause this issue.

To recreate the issue, all you should need to do is start a szurubooru instance, and create tens of thousands of pools. I don't think there's any specific number that causes the issue, I noticed increase in API response times even with only a few thousand pools, so the exact point at which it begins timing out likely is dependent on hardware and other factors.

This issue appears to be caused due to the database migrations being out of sync with the models. Specifically, indexes for pools seem to have never been created. Cloning the repository, running docker compose up -d and then running docker compose exec -it server sh -c 'PYTHONPATH=/opt/app alembic revision --autogenerate' creates a revision with the following upgrade function:

def upgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.create_index(op.f('ix_pool_category_id'), 'pool', ['category_id'], unique=False)
    op.alter_column('pool_name', 'name',
               existing_type=sa.VARCHAR(length=256),
               type_=sa.Unicode(length=128),
               existing_nullable=False)
    op.create_index(op.f('ix_pool_name_pool_id'), 'pool_name', ['pool_id'], unique=False)
    op.create_index(op.f('ix_pool_post_pool_id'), 'pool_post', ['pool_id'], unique=False)
    op.drop_constraint('pool_post_post_id_fkey', 'pool_post', type_='foreignkey')
    op.drop_constraint('pool_post_pool_id_fkey', 'pool_post', type_='foreignkey')
    op.create_foreign_key(None, 'pool_post', 'post', ['post_id'], ['id'])
    op.create_foreign_key(None, 'pool_post', 'pool', ['pool_id'], ['id'])
    op.create_index(op.f('ix_post_signature_post_id'), 'post_signature', ['post_id'], unique=False)
    op.create_index(op.f('ix_post_signature_words'), 'post_signature', ['words'], unique=False)
    # ### end Alembic commands ###

I was not able to get this revision to run with the data I had on my szurubooru instance, it was running for over 24 hours before I canceled it, but I ended up creating a new database with this migration applied, and manually moving my data between the old database and new database. With these indexes created, I can list pools even with over 120,000 pools created, whereas before it would lock up the entire backend.

rr- / szurubooru

Timeout on pools API with lots of pools. #594