robertoszek / pleroma-bot

Bot for mirroring one or multiple Twitter accounts in Pleroma/Mastodon/Misskey.
https://robertoszek.github.io/pleroma-bot
MIT License
104 stars 18 forks source link

Failing to process tweets - multiple issues #114

Open tomakun opened 1 year ago

tomakun commented 1 year ago

Hi @robertoszek, Posting from Twitter to Mastodon here. I am using a Twitter dev token with Elevated access.

I have 3 users, two of which are working fine. Whenever I try to gather tweets for the third one, I get the following error:

Error log ```shell ℹ 2023-01-21 14:36:18,416 - pleroma_bot - INFO - config path: /home/mastodon/pleroma-bot/config.yml ℹ 2023-01-21 14:36:18,416 - pleroma_bot - INFO - tweets temp folder: /home/mastodon/pleroma-bot/tweets ℹ 2023-01-21 14:36:18,422 - pleroma_bot - INFO - ====================================== ℹ 2023-01-21 14:36:18,422 - pleroma_bot - INFO - Processing user: user1 (up and running) ✖ 2023-01-21 14:36:19,315 - pleroma_bot - ERROR - Exception occurred for user, skipping... (cli.py:717) Traceback (most recent call last): File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/cli.py", line 577, in main raise Exception( Exception: Invalid forceDate format, use "YYYY-mm-dd" ℹ 2023-01-21 14:36:19,315 - pleroma_bot - INFO - ====================================== ℹ 2023-01-21 14:36:19,316 - pleroma_bot - INFO - Processing user: user2 (up and running) ✖ 2023-01-21 14:36:20,066 - pleroma_bot - ERROR - Exception occurred for user, skipping... (cli.py:717) Traceback (most recent call last): File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/cli.py", line 577, in main raise Exception( Exception: Invalid forceDate format, use "YYYY-mm-dd" ℹ 2023-01-21 14:36:20,067 - pleroma_bot - INFO - ====================================== ℹ 2023-01-21 14:36:20,067 - pleroma_bot - INFO - Processing user: problematic new user ℹ 2023-01-21 14:36:21,980 - pleroma_bot - INFO - How far back should we retrieve tweets from the Twitter account? ℹ 2023-01-21 14:36:21,980 - pleroma_bot - INFO - Enter a date (YYYY-MM-DD): [Leave it empty to retrieve *ALL* tweets or enter 'continue' if you want the bot to execute as normal (checking date of last post in the Fediverse account)] 2022-10-01 ``` ``` ⚠ 2023-01-21 14:36:30,552 - pleroma_bot - WARNING - Raising max_tweets to the maximum allowed value (_utils.py:606) Gathering tweets... 1207 ℹ 2023-01-21 14:36:47,105 - pleroma_bot - INFO - tweets gathered: 1207 Processing tweets... : 0%| | 0/1207 [00:00

I can't seem to be able to locate the reason why this would be failing. One difference is that the third user has include_rts set to true - but it failed as well when I tried again with include_rts: false on this user. Here's my (partially redacted) config.yml:

Config ```yaml # Global Mapping # pleroma_base_url: XXX max_tweets: 40 twitter_token: XXX delay_post: 1 # User mapping users: - twitter_username: XXX pleroma_username: XXX pleroma_token: XXX signature: false include_rts: false include_replies: false include_quotes: true visibility: "unlisted" avoid_duplicates: true media_upload: true twitter_bio: true bio_text: "\U0001F916 BEEP BOOP \U0001F916 \nI'm a bot that mirrors\ \ {{ twitter_username }} Twitter's account. \nAny issues please\ \ contact @XXX \n \n " - twitter_username: XXX pleroma_username: XXX pleroma_token: XXX signature: false include_rts: false include_replies: false include_quotes: true visibility: "unlisted" avoid_duplicates: true media_upload: true twitter_bio: true bio_text: "\U0001F916 BEEP BOOP \U0001F916 \nI'm a bot that mirrors\ \ {{ twitter_username }} Twitter's account. \nAny issues please\ \ contact @XXX \n \n " - twitter_username: XXX pleroma_username: XXX pleroma_token: XXX signature: false include_rts: true include_replies: false include_quotes: true visibility: "unlisted" avoid_duplicates: true media_upload: true twitter_bio: true bio_text: "\U0001F916 BEEP BOOP \U0001F916 \nI'm a bot that mirrors\ \ {{ twitter_username }} Twitter's account. \nAny issues please\ \ contact @XXX \n \n " ```

Any assistance you could provide is appreciated.

Best, Thomas

robertoszek commented 1 year ago

Hi!

Hmm, I'm thinking perhaps there's a tweet for that user with "_referencedtweets" that has no text field somehow?

Does trying with this version make any difference?: pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple pleroma-bot==1.2.1rc4

I'm curious about what the error message is when you tried it include_rts: false, was it the same one?

Oh, and running the bot in verbose mode could maybe help us pin down what the issue is:

$ pleroma-bot -v
robertoszek commented 1 year ago

Nevermind my last comment about missing a text field, reading the traceback again it seems like the id field in referenced_tweets may be a list instead.

Could you try if that's the issue by running 1.2.1rc5?: pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple pleroma-bot==1.2.1rc5

tomakun commented 1 year ago

Thanks @robertoszek for looking into this! I ran the bot using verbose - A LOT of things got printed. The DEBUG lines I got before the error are as follows:

Error log ```shell DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1553362097135706115?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics%2Calt_text&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 200 1100 DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): t.co:443 DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1549790616996831232?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics%2Calt_text&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 429 94 ``` The rest of the error, printed two times this time. ``` Processing tweets... : 0%| | 0/2427 [01:09

Seems like I'm getting a 429 response at some point before the error?

Will try on 1.2.1rc5 and get back to you.

tomakun commented 1 year ago

@robertoszek 1.2.1rc5 gives me the same error:

Error log ```shell 1.2.1rc5 mastodon@ip-XXX:~/pleroma-bot$ ../.local/bin/pleroma-bot --forceDate problematic_user `^y6gB@@BBQA{, :fB@@@@@@BBBBBQgU" `f@@@@@@@@BBBBQgg80H~ H@@B@BB@BBBB#Qgg&0RNT z@@&B@BBBBBBQgg80RD6HK ;@@@QB@BBBB#Qgg&0RN6WqS q@@@@@BBBBQgg80RN6HAqSo _ _ z@@@@BBBB#Qg8&0RN6WqSUhr | | | | -H@@@@BBBBQQg80RD6HAqSKh( ___| |_ ___ _ __| | __ rB@@@BBBB#6Lm00DN6WqSUhfv / __| __/ _ \| '__| |/ / f@@@@BBBBf= |0RD6HAqSKhfv \__ \ || (_) | | | < =g@@@BBBBF= "RDN6WqSUhff{ |___/\__\___/|_| |_|\_| c@@@@BBgu_ ~WD9HAqSKhfkl` _6@@@BBNr 'qN6WqSUhhfXI' . . . rB@@@B0r `S6HAqSKhfkoCr ,-. | ,-. ,-. ,-. ,-,-. ,-. |-. ,-. |- `X@@@BQx `I6WASShhfXFIy_ | | | |-' | | | | | | ,-| -- | | | | | _g@@@Q\` JHAqSKhfXoCwJz_ |-' `' `-' ' `-' ' ' ' `-^ `-' `-' `' rB@@#x` }WASShhfXsIyzuu, | `y@@&| .IAqSKhfXoCwJzu1lr ' `D@&| :KqSUhffXsIyzuu1llc, ff= `==:::""",,,,________ ℹ 2023-01-21 16:12:12,427 - pleroma_bot - INFO - config path: /home/mastodon/pleroma-bot/config.yml ℹ 2023-01-21 16:12:12,427 - pleroma_bot - INFO - tweets temp folder: /home/mastodon/pleroma-bot/tweets ℹ 2023-01-21 16:12:12,433 - pleroma_bot - INFO - ====================================== ℹ 2023-01-21 16:12:12,433 - pleroma_bot - INFO - Processing user: user1 (up and running) ✖ 2023-01-21 16:12:13,492 - pleroma_bot - ERROR - Exception occurred for user, skipping... (cli.py:719) Traceback (most recent call last): File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/cli.py", line 578, in main raise Exception( Exception: Invalid forceDate format, use "YYYY-mm-dd" ℹ 2023-01-21 16:12:13,492 - pleroma_bot - INFO - ====================================== ℹ 2023-01-21 16:12:13,492 - pleroma_bot - INFO - Processing user: user2 (up and running) ✖ 2023-01-21 16:12:14,205 - pleroma_bot - ERROR - Exception occurred for user, skipping... (cli.py:719) Traceback (most recent call last): File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/cli.py", line 578, in main raise Exception( Exception: Invalid forceDate format, use "YYYY-mm-dd" ℹ 2023-01-21 16:12:14,206 - pleroma_bot - INFO - ====================================== ℹ 2023-01-21 16:12:14,206 - pleroma_bot - INFO - Processing user: problematic new user ℹ 2023-01-21 16:12:16,206 - pleroma_bot - INFO - How far back should we retrieve tweets from the Twitter account? ℹ 2023-01-21 16:12:16,206 - pleroma_bot - INFO - Enter a date (YYYY-MM-DD): [Leave it empty to retrieve *ALL* tweets or enter 'continue' if you want the bot to execute as normal (checking date of last post in the Fediverse account)] 2022-07-01 ⚠ 2023-01-21 16:12:24,287 - pleroma_bot - WARNING - Raising max_tweets to the maximum allowed value (_utils.py:606) Gathering tweets... 2427 ℹ 2023-01-21 16:12:55,219 - pleroma_bot - INFO - tweets gathered: 2427 Processing tweets... : 0%| | 0/2427 [01:19

Changing the config mapping for that user to include_rts: false:

Error log ```shell ✖ 2023-01-21 16:18:04,778 - pleroma_bot - ERROR - Exception occurred for user, skipping... (cli.py:719) multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_processing.py", line 110, in process_tweets tweet["text"] = _get_rt_text(self, tweet) File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_processing.py", line 292, in _get_rt_text tweet_ref = self._get_tweets("v2", tweet_ref_id) File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_twitter.py", line 548, in _get_tweets tweets_v2 = self._get_tweets_v2( File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_twitter.py", line 662, in _get_tweets_v2 response = self.twitter_api_request( File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_twitter.py", line 83, in twitter_api_request logger.info(_( TypeError: 'list' object is not callable """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/cli.py", line 655, in main tweets_to_post = process_parallel( File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_utils.py", line 120, in process_parallel for idx, res in enumerate( File "/usr/lib/python3.10/multiprocessing/pool.py", line 873, in next raise value TypeError: 'list' object is not callable ```
tomakun commented 1 year ago

@robertoszek On another note, the very first time I tried to run the bot with that new user I got an additional WARNING before the error message, like so:

WARNING - Media possibly geoblocked? ...

Although the time range was also different here. I figured it would fix the issue if I would retrieve less tweets but it didn't...

Error log ```shell Enter a date (YYYY-MM-DD): [Leave it empty to retrieve *ALL* tweets or enter 'continue' if you want the bot to execute as normal (checking date of last post in the Fediverse account)] 2020-01-01 ⚠ 2023-01-21 14:04:49,704 - pleroma_bot - WARNING - Raising max_tweets to the maximum allowed value (_utils.py:606) Gathering tweets... 3209 ℹ 2023-01-21 14:05:34,145 - pleroma_bot - INFO - tweets gathered: 3209 Processing tweets... : 0%| | 0/3209 [00:00

I appreciate your patience and help on this case.

robertoszek commented 1 year ago

Ah, thank you for the verbose output. Looks like I was looking at completely the wrong place.

It seems you're hitting a rate limit (429) but then the logger crashes trying to display the message telling you when it will reset (and waiting until then). I'm thinking the headers for the Twitter rate limits may have changed their format or perhaps their content?: X-Rate-Limit-Remaining X-Rate-Limit-Reset X-Rate-Limit-Limit

They have been doing sweeping changes on their APIs haphazardly lately, so I wouldn't put it past them.

I've added some extra debug statements to 1.2.1rc7 (and changed a few lines hoping for a more meaningful error message): pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple pleroma-bot==1.2.1rc7

Could you try again with that version and report back with the traceback/verbose output?

tomakun commented 1 year ago

Hi @robertoszek, ran with 1.2.1rc7 verbose, I can see your new debug statements as

Error log ```shell DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1542840797925900288?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics%2Calt_text&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 429 94 DEBUG:pleroma_bot:x-rate-limit-remaining: 0 x-rate-limit-reset: 1674358688 x-rate-limit-limit: 300 reset_time: 2023-01-22 03:38:08 ```

Here's the full output

Error log ```shell ℹ 2023-01-22 03:25:14,172 - pleroma_bot - INFO - tweets gathered: 2440 INFO:pleroma_bot:tweets gathered: 2440 Processing tweets... : 0%| | 0/2440 [00:00

@robertoszek Why do you think I would be reaching a rate limit in that case? Is there anything can be done?

robertoszek commented 1 year ago

I see, I think we're close to cracking the root cause of the bug. Can you test with 1.2.1rc8?:

 pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple pleroma-bot==1.2.1rc8
tomakun commented 1 year ago

@robertoszek I looked up into the rate limit issue that I was running into.

DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1542840797925900288?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics%2Calt_text&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 429 94
DEBUG:pleroma_bot:x-rate-limit-remaining: 0
x-rate-limit-reset: 1674358688
x-rate-limit-limit: 300

Bringing the conclusion on top: I was able to proceed and get pass that rate limit after commenting out the twitter_token line in my config.yml. Here's my reasoning for trying out without the twitter_token:

  1. In the verbose output of my failing run I noticed the v2 tweets endpoint is being called exactly 301 times, with a 429 response on the 301st call.

  2. I looked into the Twitter API rate limit docs for v2 and it said:

    Tweet lookup: Per app 300 | Per user 900

The twitter token in the config.yml being the app one, I figured I might be hitting that cap, and that my only recourse was to test as guest, without the twitter app token. And indeed, this worked, probably because as guest from the API's point of view, my new Rate limit was 900.

Now, this a bit confusing to me. What is the point of using a twitter_token if we are going to hit Rate limits faster than guest token? Is there really a bug going on here or is it just a rate limit issue?

I see, I think we're close to cracking the root cause of the bug. Can you test with 1.2.1rc8?:

I will retry with this version and put back the twitter_token in so that it gives us the same test. I have a process running right now, I'll get back as soon as its done.

robertoszek commented 1 year ago

Right, you're indeed correct, the 300 per app lookup limit is listed under the "Requests per 15-minute window".

There's a bug in the sense the bot should not crash and burn when a rate limit is hit using your Twitter token.

The expected behavior is for the bot to gracefully take the time when the cap for the rate limit will be reset (in 15min) and waiting until then, resuming and continuing where it left off.

tomakun commented 1 year ago

That would actually be fantastic if it could do that yeah. I'm assuming this was an already known issue, apologies for bothering you with that. I will post the results of 1.2.1rc8 soon.

tomakun commented 1 year ago

Ran the 1.2.1rc8 bot with the same settings, using my twitter_token in the config.yml. The bot behaved according to your comment above:

Debug log ```shell DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/... HTTP/1.1" 429 94 DEBUG:pleroma_bot:x-rate-limit-remaining: 0 x-rate-limit-reset: 1674438829 x-rate-limit-limit: 300 reset_time: 2023-01-23 01:53:49 ℹ 2023-01-23 01:39:50,989 - pleroma_bot - INFO - Rate limit exceeded. 0 out of 300 requests remaining until 2023-01-23 01:53:49 UTC INFO:pleroma_bot:Rate limit exceeded. 0 out of 300 requests remaining until 2023-01-23 01:53:49 UTC ℹ 2023-01-23 01:39:50,990 - pleroma_bot - INFO - Sleeping for 840s... INFO:pleroma_bot:Sleeping for 840s... ```

I did several runs and I think the sleeping behavior is working well. 👍

However, I ran into a new issue a couple of times already: The bot appears to be crashing when getting a 504 response from the media endpoint:

Error log ```shell DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): t.co:443 DEBUG:urllib3.connectionpool:https://pbs.twimg.com:443 "GET /media/FX1qTjBaUAA3aB8.jpg HTTP/1.1" 200 133790 DEBUG:urllib3.connectionpool:https://pbs.twimg.com:443 "GET /media/FXxQ_9-acAAY3Lq.jpg HTTP/1.1" 504 357 Processing tweets... : 0%| | 0/2485 [06:06

Since I will be doing a quite a bit of imports like this, I guess we will be able to clear a few edges cases like this one. Do you think you can take a look? I'll continue running tests with any new version you would provide if it helps you as well.

robertoszek commented 1 year ago

Interesting, perhaps we can mitigate the 504's when downloading media by reusing the custom session adapter we create for Twitter's API queries (which includes additional retries and handling other error codes as well).

Would you mind giving 1.2.1rc11 a try and see if you notice any improvement?:

 pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple pleroma-bot==1.2.1rc11
tomakun commented 1 year ago

Hi @robertoszek, It looks like I am not getting the 504 anymore with 1.2.1rc11, I'll keep watch to see if this happens again, so that may be improved with this release already! Thank you.

Meanwhile I have been getting another error, which seems to be while Posting Tweets. I am getting hit with a read timeout 404 error from the GET request on the Mastodon media API, while posting tweets.

Error log ```shell DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): myinstanceurl:443 DEBUG:urllib3.connectionpool:https://myinstanceurl:443 "GET /api/v1/media HTTP/1.1" 404 None ✖ 2023-01-24 06:24:10,803 - pleroma_bot - ERROR - Exception occurred for user, skipping... (cli.py:722) ```

The bot crashes here with the following traceback:

Error log ```shell Traceback (most recent call last): File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 445, in _make_request six.raise_from(e, None) File "", line 3, in raise_from File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 440, in _make_request httplib_response = conn.getresponse() File "/usr/lib/python3.10/http/client.py", line 1374, in getresponse response.begin() File "/usr/lib/python3.10/http/client.py", line 318, in begin version, status, reason = self._read_status() File "/usr/lib/python3.10/http/client.py", line 279, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/usr/lib/python3.10/socket.py", line 705, in readinto return self._sock.recv_into(b) File "/usr/lib/python3.10/ssl.py", line 1274, in recv_into return self.read(nbytes, buffer) File "/usr/lib/python3.10/ssl.py", line 1130, in read return self._sslobj.read(len, buffer) TimeoutError: The read operation timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send resp = conn.urlopen( File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 755, in urlopen retries = retries.increment( File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 532, in increment raise six.reraise(type(error), error, _stacktrace) File "/usr/lib/python3/dist-packages/six.py", line 719, in reraise raise value File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in urlopen httplib_response = self._make_request( File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 447, in _make_request self._raise_timeout(err=e, url=url, timeout_value=read_timeout) File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 336, in _raise_timeout raise ReadTimeoutError( urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='myinstanceurl', port=443): Read timed out. (read timeout=30) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/cli.py", line 689, in main post_id = user.post( File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_utils.py", line 792, in post post_id = self.post_pleroma(tweet, poll, sensitive, media, cw=cw) File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_pleroma.py", line 267, in post_pleroma response = pleroma_api_request( File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_pleroma.py", line 40, in pleroma_api_request response = session.request( File "/usr/lib/python3/dist-packages/requests/sessions.py", line 542, in request resp = self.send(prep, **send_kwargs) File "/usr/lib/python3/dist-packages/requests/sessions.py", line 655, in send r = adapter.send(request, **kwargs) File "/usr/lib/python3/dist-packages/requests/adapters.py", line 529, in send raise ReadTimeout(e, request=request) requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='myinstanceurl', port=443): Read timed out. (read timeout=30) ERROR:pleroma_bot:Exception occurred for user, skipping... Traceback (most recent call last): File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 445, in _make_request six.raise_from(e, None) File "", line 3, in raise_from File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 440, in _make_request httplib_response = conn.getresponse() File "/usr/lib/python3.10/http/client.py", line 1374, in getresponse response.begin() File "/usr/lib/python3.10/http/client.py", line 318, in begin version, status, reason = self._read_status() File "/usr/lib/python3.10/http/client.py", line 279, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/usr/lib/python3.10/socket.py", line 705, in readinto return self._sock.recv_into(b) File "/usr/lib/python3.10/ssl.py", line 1274, in recv_into return self.read(nbytes, buffer) File "/usr/lib/python3.10/ssl.py", line 1130, in read return self._sslobj.read(len, buffer) TimeoutError: The read operation timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send resp = conn.urlopen( File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 755, in urlopen retries = retries.increment( File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 532, in increment raise six.reraise(type(error), error, _stacktrace) File "/usr/lib/python3/dist-packages/six.py", line 719, in reraise raise value File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in urlopen httplib_response = self._make_request( File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 447, in _make_request self._raise_timeout(err=e, url=url, timeout_value=read_timeout) File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 336, in _raise_timeout raise ReadTimeoutError( urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='myinstanceurl', port=443): Read timed out. (read timeout=30) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/cli.py", line 689, in main post_id = user.post( File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_utils.py", line 792, in post post_id = self.post_pleroma(tweet, poll, sensitive, media, cw=cw) File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_pleroma.py", line 267, in post_pleroma response = pleroma_api_request( File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_pleroma.py", line 40, in pleroma_api_request response = session.request( File "/usr/lib/python3/dist-packages/requests/sessions.py", line 542, in request resp = self.send(prep, **send_kwargs) File "/usr/lib/python3/dist-packages/requests/sessions.py", line 655, in send r = adapter.send(request, **kwargs) File "/usr/lib/python3/dist-packages/requests/adapters.py", line 529, in send raise ReadTimeout(e, request=request) requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='myinstanceurl', port=443): Read timed out. (read timeout=30) ```

Any ideas you might have? Again, I appreciate you time and support on this, this helps a lot. Thank you.

tomakun commented 1 year ago

Following up on my comment above, I think there's something up with this version. The 504 doesn't occur anymore but it takes forever to process the Tweets (although its probably expected with 3200 tweets ([3200/300]=10 batches with 15 min wait in between, 150min of dead time T_T).

The [00:00<?, ?it/s] counter isn't working/moving like on the stable release, the sleeping info message is duplicated/tripled) - and when it's time to post the tweets I'm getting the read error mentioned above. It hurts lol.

In my use case I expect to bring quite a few users with 3200 tweets, I don't mind the Twitter API wait time but if it crashes after the processing it's not ideal I guess.

Error log ```shell 2022-01-01 ⚠ 2023-01-24 07:19:16,167 - pleroma_bot - WARNING - Raising max_tweets to the maximum allowed value (_utils.py:615) Gathering tweets... 3209 ℹ 2023-01-24 07:20:01,858 - pleroma_bot - INFO - tweets gathered: 3209 Processing tweets... : 0%| | 0/3209 [00:00
robertoszek commented 1 year ago

Regarding the multiple lines of "Rate limit exceeded"/"Sleeping": By default the bot splits the work among as many logical threads as half of your processor cores.

cores / 2 if cores > 4 else 4 If you have more than 4 cores, take half. If not, split it into 4 threads by default.

That way the bot can work in parallel making requests and processing tweets (regex matching, substituting, etc) and take only a portion of the time it would have taken if it did it single-threaded instead. If you hit a rate limit and have, let's say 4 parallel threads making requests, they all will fail at roughly the same time and have to wait until the reset time. Hence the multiple messages at once (I may try to make this more friendly/elegant, I haven't figured out yet exactly how).

Ah, I think I see what may have happened here: DEBUG:urllib3.connectionpool:https://myinstanceurl:443 "GET /api/v1/media HTTP/1.1" 404 None

A rogue debug GET statement wasn't commented out, most likely.

Any better luck with 1.2.1rc17?: pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple pleroma-bot==1.2.1rc17

tomakun commented 1 year ago

Hi @robertoszek I hope you are doing well. I took a break after learning about the API changes brought by Twitter, however I am noticing that the bot seems to be working fine, in most cases with a few hundreds of imports.

However I am still running into some crashes with bigger batches, I have not encountered the 404 issue mentioned above since then, but I will let you know if I do.

Here's the log for my most recent crash, I would greatly appreciate if you could look into it. This is on a batch of 2000 tweets, during the Processing step, at about 50% progress:

Context:

DEBUG:pleroma_bot:1637850100155420673
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pbs.twimg.com:443
DEBUG:urllib3.connectionpool:https://pbs.twimg.com:443 "GET /media/FrrP-3XaEAMQayU.jpg HTTP/1.1" 200 201479
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1640223930597400576?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics%2Calt_text&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 200 615
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pbs.twimg.com:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pbs.twimg.com:443
DEBUG:urllib3.connectionpool:https://pbs.twimg.com:443 "GET /media/FsM--CMaUAA1dpe.jpg HTTP/1.1" 200 207210
DEBUG:urllib3.connectionpool:https://pbs.twimg.com:443 "GET /media/FrrP-3XaQAAiJFy.jpg HTTP/1.1" 200 209466
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pbs.twimg.com:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pbs.twimg.com:443
DEBUG:urllib3.connectionpool:https://pbs.twimg.com:443 "GET /media/FsM--CIaQAAoWd5.jpg HTTP/1.1" 200 184088
DEBUG:pleroma_bot:1640733812996071424
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.twitter.com:443
DEBUG:urllib3.connectionpool:https://pbs.twimg.com:443 "GET /media/FrrP-3WaYAIT6bp.jpg HTTP/1.1" 200 180186
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): pbs.twimg.com:443
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1640730666861223937?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics%2Calt_text&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 200 203
Processing tweets... :  50%|██████████████████████████                          | 1164/2327 [2:16:14<2:16:07,  7.02s/it]
✖ 2023-03-29 03:06:47,570 - pleroma_bot - ERROR - Exception occurred for user, skipping... (cli.py:721)

Error log

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_processing.py", line 110, in process_tweets
    tweet["text"] = _get_rt_text(self, tweet)
  File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_processing.py", line 298, in _get_rt_text
    text = f"{prefix} {tweet_ref['data']['text']}"
KeyError: 'data'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/cli.py", line 657, in main
    tweets_to_post = process_parallel(
  File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_utils.py", line 120, in process_parallel
    for idx, res in enumerate(
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 873, in next
    raise value
KeyError: 'data'
ERROR:pleroma_bot:Exception occurred for user, skipping...
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_processing.py", line 110, in process_tweets
    tweet["text"] = _get_rt_text(self, tweet)
  File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_processing.py", line 298, in _get_rt_text
    text = f"{prefix} {tweet_ref['data']['text']}"
KeyError: 'data'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/cli.py", line 657, in main
    tweets_to_post = process_parallel(
  File "/home/mastodon/.local/lib/python3.10/site-packages/pleroma_bot/_utils.py", line 120, in process_parallel
    for idx, res in enumerate(
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 873, in next
    raise value
KeyError: 'data'

Thank you in advance for your help.

tomakun commented 1 year ago

Hi @robertoszek, hope you are doing well. I hope you can follow up whenever you get a chance!