rr- / szurubooru

Image board engine, Danbooru-style.
GNU General Public License v3.0
704 stars 178 forks source link

"Could not fetch basic configuration from server" After restart. #449

Open AveryLucas opened 2 years ago

AveryLucas commented 2 years ago

I did not change any settings in config and everything works great on first run but after a restart of the computer, restarting Szurubooru gives this error.

Not sure how to prevent this from happening. Worth noting I'm running this on Windows 10

sgsunder commented 2 years ago

Can you post log files?

If you are running on Windows 10, are you using the Docker install? The Docker containers we use are Linux-specific, so you need to make sure that your Docker installation is treating it like a Linux container and not a Windows one.

Mrspirits commented 2 years ago

Hey, I have the same issue as well.

Here's the error I keep getting in the logs from docker:

szuru-client-1 | 172.18.0.1 -> GET /api/info HTTP/1.1 [504] - referer: http://127.0.0.1:8080/ - szuru-client-1 | 2021/12/04 20:07:32 [error] 11#11: *1 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 172.18.0.1, server: , request: "GET /api/user/admin?bump-login=true HTTP/1.1", upstream: "http://172.18.0.3:6666/user/admin?bump-login=true", host: "127.0.0.1:8080", referrer: "http://127.0.0.1:8080/"

I use Docker on Windows 11 and I'm assuming the issue is that there are a bunch of things up on the booru and that's slowing it down and making it time out.

It does fix itself after a while but it takes like 10 minutes to finally start working.

neobooru commented 2 years ago

It does fix itself after a while but it takes like 10 minutes to finally start working.

Does the server container show you any output? The server does run some tasks at startup, but they should show up in the logs.

Mrspirits commented 2 years ago

It lists a bunch of tags out then comes up with this:

szuru-server-1 | INFO [alembic.runtime.migration] Context impl PostgresqlImpl. szuru-server-1 | INFO [alembic.runtime.migration] Will assume transactional DDL. szuru-server-1 | Starting szurubooru API on port 6666

and after that it keeps timing out with the error from before until it eventually does this:

szuru-client-1 | 2021/12/06 12:50:08 [error] 10#10: *25 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 172.18.0.1, server: , request: "DELETE /api/user-token/admin/03d7c379-1220-41e3-b6a4-f1a8db1b6eba HTTP/1.1", upstream: "http://172.18.0.3:6666/user-token/admin/03d7c379-1220-41e3-b6a4-f1a8db1b6eba", host: "127.0.0.1:8080", referrer: "http://127.0.0.1:8080/home" szuru-client-1 | 172.18.0.1 -> DELETE /api/user-token/admin/03d7c379-1220-41e3-b6a4-f1a8db1b6eba HTTP/1.1 [504] - referer: http://127.0.0.1:8080/home - szuru-server-1 | [2021-12-06 12:50:23] szurubooru.middleware.request_logger GET /info (user=None, queries=13) szuru-server-1 | [2021-12-06 12:50:23] szurubooru.middleware.request_logger GET /info (user=None, queries=12) szuru-server-1 | [2021-12-06 12:50:23] szurubooru.middleware.request_logger GET /user/admin (user=admin, queries=6) szuru-server-1 | [2021-12-06 12:50:23] szurubooru.middleware.request_logger GET /info (user=None, queries=12) szuru-client-1 | 2021/12/06 12:50:23 [error] 10#10: *33 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 172.18.0.1, server: , request: "GET /api/info HTTP/1.1", upstream: "http://172.18.0.3:6666/info", host: "127.0.0.1:8080", referrer: "http://127.0.0.1:8080/home" szuru-client-1 | 172.18.0.1 -> GET /api/info HTTP/1.1 [504] - referer: http://127.0.0.1:8080/home - szuru-server-1 | [2021-12-06 12:50:23] szurubooru.middleware.request_logger GET /info (user=None, queries=12) szuru-server-1 | [2021-12-06 12:50:23] szurubooru.middleware.request_logger GET /info (user=None, queries=12) szuru-server-1 | [2021-12-06 12:50:23] szurubooru.middleware.request_logger DELETE /user-token/admin/03d7c379-1220-41e3-b6a4-f1a8db1b6eba (user=admin, queries=4) szuru-server-1 | [2021-12-06 12:50:24] szurubooru.middleware.request_logger GET /info (user=None, queries=13) szuru-server-1 | [2021-12-06 12:50:24] szurubooru.middleware.request_logger GET /info (user=None, queries=12) szuru-server-1 | [2021-12-06 12:50:24] szurubooru.middleware.request_logger GET /info (user=None, queries=12) szuru-server-1 | [2021-12-06 12:50:24] szurubooru.middleware.request_logger GET /info (user=None, queries=12) szuru-server-1 | [2021-12-06 12:50:24] szurubooru.middleware.request_logger GET /info (user=None, queries=12) szuru-server-1 | [2021-12-06 12:50:24] szurubooru.middleware.request_logger GET /info (user=None, queries=12)

so it seems to delete my token, log me out and then it fixes itself. Could the issue be that I'm usually logged in as admin and not just a normal user?

phil-flip commented 1 year ago

I have similar issues with my booru

szurubooru-app-1      | 2023-01-31T19:41:19.106069393Z 192.168.192.5 -> GET / HTTP/1.1 [304] - referer: - 93.212.122.49
szurubooru-backend-1  | 2023-01-31T19:41:19.273202653Z [2023-01-31 19:41:19] sqlalchemy.engine.base.Engine BEGIN (implicit)
szurubooru-backend-1  | 2023-01-31T19:41:19.273934099Z [2023-01-31 19:41:19] sqlalchemy.engine.base.Engine SELECT post_feature.id AS post_feature_id, post_feature.post_id AS post_feature_post_id, post_feature.user_id AS post_feature_user_id, post_feature.time AS post_feature_time 
szurubooru-backend-1  | 2023-01-31T19:41:19.273972371Z FROM post_feature ORDER BY post_feature.time DESC 
szurubooru-backend-1  | 2023-01-31T19:41:19.274068171Z  LIMIT %(param_1)s
szurubooru-backend-1  | 2023-01-31T19:41:19.274101804Z [2023-01-31 19:41:19] sqlalchemy.engine.base.Engine {'param_1': 1}
szurubooru-backend-1  | 2023-01-31T19:41:19.277622427Z [2023-01-31 19:41:19] sqlalchemy.engine.base.Engine SELECT count(post.id) AS count_1 
szurubooru-backend-1  | 2023-01-31T19:41:19.277676749Z FROM post
szurubooru-backend-1  | 2023-01-31T19:41:19.277681528Z [2023-01-31 19:41:19] sqlalchemy.engine.base.Engine {}
szurubooru-backend-1  | 2023-01-31T19:41:19.287821832Z [2023-01-31 19:41:19] sqlalchemy.engine.base.Engine BEGIN (implicit)
szurubooru-backend-1  | 2023-01-31T19:41:19.291452762Z [2023-01-31 19:41:19] sqlalchemy.engine.base.Engine SELECT post_feature.id AS post_feature_id, post_feature.post_id AS post_feature_post_id, post_feature.user_id AS post_feature_user_id, post_feature.time AS post_feature_time 
szurubooru-backend-1  | 2023-01-31T19:41:19.291530789Z FROM post_feature ORDER BY post_feature.time DESC 
szurubooru-backend-1  | 2023-01-31T19:41:19.291537973Z  LIMIT %(param_1)s
szurubooru-backend-1  | 2023-01-31T19:41:19.291542201Z [2023-01-31 19:41:19] sqlalchemy.engine.base.Engine {'param_1': 1}
szurubooru-backend-1  | 2023-01-31T19:41:19.298562906Z [2023-01-31 19:41:19] sqlalchemy.engine.base.Engine SELECT count(post.id) AS count_1 
szurubooru-backend-1  | 2023-01-31T19:41:19.298780125Z FROM post
szurubooru-backend-1  | 2023-01-31T19:41:19.298794351Z [2023-01-31 19:41:19] sqlalchemy.engine.base.Engine {}
szurubooru-app-1      | 2023-01-31T19:42:19.320614937Z 2023/01/31 19:42:19 [error] 10#10: *44 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 192.168.192.5, server: , request: "GET /api/info HTTP/1.1", upstream: "http://192.168.192.3:6666/info", host: "imagearchive.flipped.codes", referrer: "https://imagearchive.flipped.codes/"
szurubooru-app-1      | 2023-01-31T19:42:19.320660903Z 2023/01/31 19:42:19 [error] 10#10: *45 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 192.168.192.5, server: , request: "GET /api/info HTTP/1.1", upstream: "http://192.168.192.3:6666/info", host: "imagearchive.flipped.codes", referrer: "https://imagearchive.flipped.codes/"
szurubooru-app-1      | 2023-01-31T19:42:19.320679379Z 192.168.192.5 -> GET /api/info HTTP/1.1 [504] - referer: https://imagearchive.flipped.codes/ 93.212.122.49
szurubooru-app-1      | 2023-01-31T19:42:19.320685089Z 192.168.192.5 -> GET /api/info HTTP/1.1 [504] - referer: https://imagearchive.flipped.codes/ 93.212.122.49
phil-flip commented 1 year ago

Ok, I just got this error right after. So it might be a database issue (?)

szurubooru-backend-1  | 2023-01-31T19:44:18.401418420Z [2023-01-31 19:44:18] waitress Exception while serving /info
szurubooru-backend-1  | 2023-01-31T19:44:18.401479144Z Traceback (most recent call last):
szurubooru-backend-1  | 2023-01-31T19:44:18.401501516Z   File "/usr/lib/python3.8/site-packages/waitress/channel.py", line 350, in service
szurubooru-backend-1  | 2023-01-31T19:44:18.401507417Z     task.service()
szurubooru-backend-1  | 2023-01-31T19:44:18.401511506Z   File "/usr/lib/python3.8/site-packages/waitress/task.py", line 171, in service
szurubooru-backend-1  | 2023-01-31T19:44:18.401515764Z     self.execute()
szurubooru-backend-1  | 2023-01-31T19:44:18.401519551Z   File "/usr/lib/python3.8/site-packages/waitress/task.py", line 441, in execute
szurubooru-backend-1  | 2023-01-31T19:44:18.401523638Z     app_iter = self.channel.server.application(environ, start_response)
szurubooru-backend-1  | 2023-01-31T19:44:18.401527496Z   File "/opt/app/szurubooru/rest/app.py", line 104, in application
szurubooru-backend-1  | 2023-01-31T19:44:18.401531764Z     response = handler(ctx, match.groupdict())
szurubooru-backend-1  | 2023-01-31T19:44:18.401535420Z   File "/opt/app/szurubooru/api/info_api.py", line 34, in get_info
szurubooru-backend-1  | 2023-01-31T19:44:18.401539137Z     post_feature = posts.try_get_current_post_feature()
szurubooru-backend-1  | 2023-01-31T19:44:18.401542844Z   File "/opt/app/szurubooru/func/posts.py", line 397, in try_get_current_post_feature
szurubooru-backend-1  | 2023-01-31T19:44:18.401546471Z     db.session.query(model.PostFeature)
szurubooru-backend-1  | 2023-01-31T19:44:18.401550158Z   File "/usr/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3429, in first
szurubooru-backend-1  | 2023-01-31T19:44:18.401553795Z     ret = list(self[0:1])
szurubooru-backend-1  | 2023-01-31T19:44:18.401557191Z   File "/usr/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3203, in __getitem__
szurubooru-backend-1  | 2023-01-31T19:44:18.401561008Z     return list(res)
szurubooru-backend-1  | 2023-01-31T19:44:18.401564305Z   File "/usr/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
szurubooru-backend-1  | 2023-01-31T19:44:18.401567821Z     return self._execute_and_instances(context)
szurubooru-backend-1  | 2023-01-31T19:44:18.401574363Z   File "/usr/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3556, in _execute_and_instances
szurubooru-backend-1  | 2023-01-31T19:44:18.401579042Z     conn = self._get_bind_args(
szurubooru-backend-1  | 2023-01-31T19:44:18.401631020Z   File "/usr/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3571, in _get_bind_args
szurubooru-backend-1  | 2023-01-31T19:44:18.401636129Z     return fn(
szurubooru-backend-1  | 2023-01-31T19:44:18.401639967Z   File "/usr/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3550, in _connection_from_session
szurubooru-backend-1  | 2023-01-31T19:44:18.401643974Z     conn = self.session.connection(**kw)
szurubooru-backend-1  | 2023-01-31T19:44:18.401647861Z   File "/usr/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1142, in connection
szurubooru-backend-1  | 2023-01-31T19:44:18.401651859Z     return self._connection_for_bind(
szurubooru-backend-1  | 2023-01-31T19:44:18.401655886Z   File "/usr/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1150, in _connection_for_bind
szurubooru-backend-1  | 2023-01-31T19:44:18.401660194Z     return self.transaction._connection_for_bind(
szurubooru-backend-1  | 2023-01-31T19:44:18.401664142Z   File "/usr/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 433, in _connection_for_bind
szurubooru-backend-1  | 2023-01-31T19:44:18.401668610Z     conn = bind._contextual_connect()
szurubooru-backend-1  | 2023-01-31T19:44:18.401674251Z   File "/usr/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2302, in _contextual_connect
szurubooru-backend-1  | 2023-01-31T19:44:18.401678589Z     self._wrap_pool_connect(self.pool.connect, None),
szurubooru-backend-1  | 2023-01-31T19:44:18.401682917Z   File "/usr/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect
szurubooru-backend-1  | 2023-01-31T19:44:18.401687095Z     return fn()
szurubooru-backend-1  | 2023-01-31T19:44:18.401691042Z   File "/usr/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 364, in connect
szurubooru-backend-1  | 2023-01-31T19:44:18.401695400Z     return _ConnectionFairy._checkout(self)
szurubooru-backend-1  | 2023-01-31T19:44:18.401699469Z   File "/usr/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
szurubooru-backend-1  | 2023-01-31T19:44:18.401703547Z     fairy = _ConnectionRecord.checkout(pool)
szurubooru-backend-1  | 2023-01-31T19:44:18.401707574Z   File "/usr/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
szurubooru-backend-1  | 2023-01-31T19:44:18.401711442Z     rec = pool._do_get()
szurubooru-backend-1  | 2023-01-31T19:44:18.401715349Z   File "/usr/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 128, in _do_get
szurubooru-backend-1  | 2023-01-31T19:44:18.401719447Z     raise exc.TimeoutError(
szurubooru-backend-1  | 2023-01-31T19:44:18.401723294Z sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30 (Background on this error at: http://sqlalche.me/e/13/3o7r)
phil-flip commented 1 year ago

hm error above is not reproducible. But yea, after like a night and doing nothing it woks fine the next day. I did clear the features posts from the DB… didn't help to speed it up other than the homepage now loading faster when it works. I don't want to restart it now, but I set it to 60 threads as it does say to increase it, if there are more posts in the DB. Considering +500k posts, I think it's needed. Unfortunately that does not solve the issue

phil-flip commented 1 year ago

So I fixed it… tbh kinda my fault for not seeing this earlier, but the issue is basically the only thing that might take long in that API call, and that is the _get_disk_usage function gets worse over time, the more pictures there are. I just basically implemented a hotfix to just return 0 to the frontend, as I personally don't care about the space anyway. After building and re-deploying the frontend loads like butter now. So either it might just need to be implemented as a toggle to disable it, or so it doesn't obstruct the loading for the user. Like creating a background job that just saves it to a variable that can be called when it's needed. I'll ping @AveryLucas and @Mrspirits incase they haven't lost hope in the booru.

def _get_disk_usage() -> int:
#    global _cache_time, _cache_result
#    threshold = timedelta(hours=48)
#    now = datetime.utcnow()
#    if _cache_time and _cache_time > now - threshold:
#        assert _cache_result is not None
#        return _cache_result
    total_size = 0
#    for dir_path, _, file_names in os.walk(config.config["data_dir"]):
#        for file_name in file_names:
#            file_path = os.path.join(dir_path, file_name)
#            try:
#                total_size += os.path.getsize(file_path)
#            except FileNotFoundError:
#                pass
#    _cache_time = now
#    _cache_result = total_size
    return total_size
Theenoro commented 7 months ago

As you said it would boost up the whole frontend. I had trouble with it. I use this for myself to store a lot of images, the data_dir is mounted as NFS. It also have to go over a gateway cause the storage is in another network.

So as I understand and looked it would be as easy as implementing a new variable in the config file. That we could read the variable like this config.config["data_dir_get_usage"].

def _get_disk_usage() -> int:
    total_size = 0
    if config.config["data_dir_get_usage"] == 1:
        global _cache_time, _cache_result
        threshold = timedelta(hours=48)
        now = datetime.utcnow()
        if _cache_time and _cache_time > now - threshold:
            assert _cache_result is not None
            return _cache_result
        for dir_path, _, file_names in os.walk(config.config["data_dir"]):
            for file_name in file_names:
                file_path = os.path.join(dir_path, file_name)
                try:
                    total_size += os.path.getsize(file_path)
                except FileNotFoundError:
                    pass
        _cache_time = now
        _cache_result = total_size
    return total_size

I don't do much with python and yaml files. If I'm correct it should only be the data_dir_get_usage added in config.yaml.dist below the data_dir variable and as a value 1. Like

data_dir_get_usage: 1

Then it should merge the dist config with the normal config file if I understand it right. So if someone would have this issue too they only have to change the value in the config file to 0 and thats it.

I might try it the next days, I still have to set up many VMs, cause one of my hosts died. no backup no pity