reluce / szurubooru-toolkit

Python package and script collection to manage your szurubooru image board.
GNU General Public License v3.0
43 stars 14 forks source link

Create tags is broken #35

Closed LibertX closed 1 year ago

LibertX commented 1 year ago

Hello,

It looks like create-tags feature does not work anymore:

root@szurubooru-toolkit-7b59658595-g9s96:/szurubooru-toolkit# /usr/local/bin/create-tags --overwrite --query genshin*
[INFO] [25.04.2023, 12:53:54 UTC]: Fetching tags from URL https://danbooru.donmai.us/tags.json?search[post_count]=>10&search[name_matches]=genshin*&limit=100&page=1...
[CRITICAL] [25.04.2023, 12:53:54 UTC] [danbooru.download_tags]: Could not fetch tags: Expecting value: line 1 column 1 (char 0)
[SUCCESS] [25.04.2023, 12:53:54 UTC]: Script finished creating tags!

Thanks!

reluce commented 1 year ago

Hi, it looks like Danbooru has tighten their bot protection. Some weeks ago it was enough to supply the user agent header, but now it seems that this isn't enough.

I receive a 403 error with requests as it's prompting to activate JavaScript and cookies in the response. Using authentication and requests session also results in a 403.

As it stands, it might be a bigger issue than I thought, so it might take a bit longer till I find a suitable fix.

reluce commented 1 year ago

Turned out that Danbooru/Cloudflare is blocking legit looking user agents. Using a 'custom' user agent did the trick. Patch is available in 0.8.0.

LibertX commented 1 year ago

Hi,

I tried 0.8.0, I now have this error:

root@szurubooru-toolkit-5b5cb8bf7f-zhvpw:/szurubooru-toolkit# create-tags
[INFO] [07.05.2023, 14:28:11 UTC]: Fetching tags from URL https://danbooru.donmai.us/tags.json?search[post_count]=>10&search[name_matches]=*&limit=100&page=1...
[ERROR] [07.05.2023, 14:28:11 UTC] [create-tags.<module>]: An error has been caught in function '<module>', process 'MainProcess' (28), thread 'MainThread' (140525874476864):
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/requests/models.py", line 971, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/local/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/create-tags", line 6, in <module>
    sys.exit(main())
  File "/szurubooru-toolkit/src/szurubooru_toolkit/scripts/create_tags.py", line 103, in main
    szuru.create_tag(tag['name'], convert_tag_category(tag['category']), overwrite)
  File "/szurubooru-toolkit/src/szurubooru_toolkit/szurubooru.py", line 204, in create_tag
    if 'description' in response.json():
  File "/usr/local/lib/python3.10/site-packages/requests/models.py", line 975, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
reluce commented 1 year ago

It looks like the same error that I encountered previously during testing. Can you make sure that you're using at least 0.8.0?

LibertX commented 1 year ago

I am:

Containers:
  szurubooru-toolkit:
    Container ID:   containerd://3461fb1baac3a4265c06f7b535645cdf9c1c0a8c5660d51cfeb0d832210890c8
    Image:          reluce/szurubooru-toolkit
    Image ID:       docker.io/reluce/szurubooru-toolkit@sha256:7a2a3d39e6199355ca16927562a4435e46bc259ed01db7562584e96845eda8fd
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Sun, 07 May 2023 16:27:55 +0200
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/cron.d/crontab from szurubooru-toolkit-config (rw,path="crontab")
      /szurubooru-toolkit/config.toml from szurubooru-toolkit-config (rw,path="config.toml")
      /szurubooru-toolkit/misc from szurubooru-toolkit-misc (rw)
      /szurubooru-toolkit/upload_src from szurubooru-upload (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-znstc (ro)

As you can see the sha256 of the image matches 0.8.0 and latest (I'm using latest).

LibertX commented 1 year ago

CloudFlare anti-DDoS is known to be hit or miss, if this is the issue.

You might want to take a look at https://github.com/FlareSolverr/FlareSolverr.

reluce commented 1 year ago

I just spun up a container to test it and it was still working for me. grep User-Agent src/szurubooru_toolkit/danbooru.py should return headers = {'User-Agent': 'Danbooru dummy agent'}.

You can also try to insert self.session.get(tag_url).text in src/szurubooru_toolkit/danbooru.py (line 141 just before the yield) and run the command again. Then you can check the message returned from the request.

LibertX commented 1 year ago

It started working for me yesterday too. I think I was temporarily banned by CloudFlare since the create-tags runs periodically, and the ban "expired".

Thanks a lot!