robertoszek / pleroma-bot

Bot for mirroring one or multiple Twitter accounts in Pleroma/Mastodon/Misskey.
https://robertoszek.github.io/pleroma-bot
MIT License
104 stars 18 forks source link

Gets stuck on "Processing tweets..." #73

Closed nemobis closed 2 years ago

nemobis commented 2 years ago

Sometimes this happens:

ℹ 2022-04-04 10:14:59,842 - pleroma_bot - INFO - ====================================== 
ℹ 2022-04-04 10:14:59,842 - pleroma_bot - INFO - Processing user:       108070211019592095 
ℹ 2022-04-04 10:14:59,842 - pleroma_bot - INFO - It seems like pleroma-bot is running for the first time for this Twitter user: linagalvezmunoz 
ℹ 2022-04-04 10:15:00,903 - pleroma_bot - INFO - How far back should we retrieve tweets from the Twitter account? 
ℹ 2022-04-04 10:15:00,903 - pleroma_bot - INFO - 
Enter a date (YYYY-MM-DD):
[Leave it empty to retrieve *ALL* tweets or enter 'continue'
if you want the bot to execute as normal (checking date of 
last post in the Fediverse account)]  
⚠ 2022-04-04 10:15:01,106 - pleroma_bot - WARNING - No posts were found in the target Fediverse account (_pleroma.py:43) 
ℹ 2022-04-04 10:15:01,106 - pleroma_bot - INFO - How far back should we retrieve tweets from the Twitter account? 
ℹ 2022-04-04 10:15:01,106 - pleroma_bot - INFO - 
Enter a date (YYYY-MM-DD):
[Leave it empty to retrieve *ALL* tweets or enter 'continue'
if you want the bot to execute as normal (checking date of 
last post in the Fediverse account)]  
ℹ 2022-04-04 10:15:02,614 - pleroma_bot - INFO - tweets gathered:        50 
Processing tweets...  \^C

This was stuck for 45 min there. Maybe there's a need of a hard timeout somewhere.

The error seems to be somehow user-dependent, in that I've seen it more than once for some users.

nemobis commented 2 years ago

I think it has something to do with the threading library. It's waiting for something that never comes.

Processing tweets...  |^CTraceback (most recent call last):
  File "/home/7/federico/mastodon/lib/python3.7/site-packages/pleroma_bot/cli.py", line 515, in main
    tweets_to_post = process_parallel(tweets, user, threads)
  File "/home/7/federico/mastodon/lib/python3.7/site-packages/pleroma_bot/_utils.py", line 111, in process_parallel
    ret = mp.wait()  # get all results
  File "/home/7/federico/mastodon/lib/python3.7/site-packages/pleroma_bot/_utils.py", line 83, in wrapper
    spinner_thread.join()
  File "/usr/lib/python3.7/threading.py", line 1032, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.7/threading.py", line 1048, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/7/federico/mastodon/bin/pleroma-bot", line 10, in <module>
    sys.exit(init())
  File "/home/7/federico/mastodon/lib/python3.7/site-packages/pleroma_bot/cli.py", line 594, in init
    sys.exit(main())
  File "/home/7/federico/mastodon/lib/python3.7/site-packages/pleroma_bot/cli.py", line 515, in main
    tweets_to_post = process_parallel(tweets, user, threads)
KeyboardInterrupt
nemobis commented 2 years ago

Now run in verbose mode... wth?

DEBUG:urllib3.connectionpool:https://twitter.com:443 "HEAD /Mariia_Zolkina/status/1510323171815264258/photo/1 HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.twitter.com:443
DEBUG:urllib3.connectionpool:https://twitter.com:443 "HEAD /DefenceU/status/1510372750665433089 HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://twitter.com:443 "HEAD /IAPonomarenko/status/1510379221679542273/photo/1 HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.twitter.com:443
DEBUG:urllib3.connectionpool:https://twitter.com:443 "HEAD /JulianRoepcke/status/1510345706250416136/photo/1 HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.twitter.com:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.twitter.com:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.twitter.com:443
DEBUG:urllib3.connectionpool:https://twitter.com:443 "HEAD /GissiSim/status/1510381911302098944/video/1 HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.twitter.com:443
DEBUG:urllib3.connectionpool:https://twitter.com:443 "HEAD /visegrad24/status/1510621810936721415 HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.twitter.com:443
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1510344076838715399?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 429 94
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1510339436617342984?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 429 94
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1510370057855156229?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 429 94
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1510372750665433089?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 429 94
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1510323171815264258?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 429 94
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1510319120935440390?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 429 94
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1510659351261892614?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 429 94
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1510379221679542273?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 429 94
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1510345706250416136?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 429 94
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1510373338778066946?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 429 94
DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1510651739942363138?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 429 94
nemobis commented 2 years ago

I guess it doesn't help that my machine has 48 cores, given:

cli.py
514:                    threads = round(cores / 2 if cores > 4 else 4)
515:                    tweets_to_post = process_parallel(tweets, user, threads)

Perhaps some limit here would be in order. Or parallelism could be reduced when hitting HTTP 429?

robertoszek commented 2 years ago

Yeah, I've only tested the bot on more modest machines so far 😅 Should we close this one as a duplicate of #72 ? The root cause of this basically is the unhandled HTTP 429 responses, if those are handled, the higher number of cores or additional parallel requests should not be an issue.

nemobis commented 2 years ago

Il 05/04/22 19:54, robertoszek ha scritto:

Should we close this one as a duplicate of #72 ?

Yes, I think that's reasonable. I managed to work around this by tweaking other configs to reduce the number of posts.

Sometimes it seems to get stuck on SSL handshake or something, but I'm not sure:

INFO:pleroma_bot:tweets gathered: 6 DEBUG:pleroma_bot:1510541472306712579 Processing tweets... |DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.twitter.com:443 DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /1.1/statuses/show.json?id=1510541472306712579 HTTP/1.1" 200 1117 DEBUG:pleroma_bot:1510603710950522885 DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.twitter.com:443 DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1510518074319024129?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 200 1300 DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): t.co:443

robertoszek commented 2 years ago

Ok, closing for now as a duplicate of #72 If after handling 429's still gets stuck I'll investigate further and reopen this issue. Thanks!

nemobis commented 2 years ago

Yes. If this still happens I think an easy first step is to change this line https://github.com/robertoszek/pleroma-bot/blob/a26b5cfa567f92bab1aa0f63cb985c63f0dfe72d/pleroma_bot/_processing.py#L434 from session.head(group, allow_redirects=True) to something like session.head(group, allow_redirects=True, timeout=(7, 10)) to see if a stricter timeout helps with cases like this:

ℹ 2022-04-05 21:15:54,606 - pleroma_bot - INFO - tweets gathered:        2 
INFO:pleroma_bot:tweets gathered:        2
DEBUG:pleroma_bot:1511251315871166468
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.twitter.com:443
Processing tweets...  |DEBUG:urllib3.connectionpool:https://api.twitter.com:443 "GET /2/tweets/1511246333511938048?poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Csource%2Ctext%2Cwithheld HTTP/1.1" 200 1125
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): t.co:443
DEBUG:urllib3.connectionpool:https://t.co:443 "HEAD /GtotlL2HJX HTTP/1.1" 301 0

For mysterious reasons the tweet https://nitter.eu/WorkersEESC/status/1511246333511938048#m failed multiple times, seemingly on the expansion of https://t.co/GtotlL2HJX to http://eutoday.net/news/security-defence/2022/extreme-political-groups-endangering-the-health-of-our-democracy-says-jos%C3%A9-antonio-moreno-d%C3%ADaz

robertoszek commented 2 years ago

For mysterious reasons the tweet https://nitter.eu/WorkersEESC/status/1511246333511938048#m failed multiple times, seemingly on the expansion of https://t.co/GtotlL2HJX to http://eutoday.net/news/security-defence/2022/extreme-political-groups-endangering-the-health-of-our-democracy-says-jos%C3%A9-antonio-moreno-d%C3%ADaz

@nemobis Looks like it gets a bit confused about the encoding: https://github.com/psf/requests/blob/2d5517682b3b38547634d153cea43d48fbc8cdb5/requests/sessions.py#L95

I implemented a best effort exception handler on 1.0.3rc10 in case it runs into latin1-encoded redirect headers with non-ASCII characters : 00e75d4b124aca686fed51b5e362b176d6477e90

pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple pleroma-bot==1.0.3rc10

At least it seems to work with the example you provided, let me know if you run into any others.

ltning commented 1 year ago

I'm playing with this and on 1.1.0 this still seems to be an issue :( This is my first attempt at using pleroma-bot, and after getting all the keys etc. installed, it gets to "processing tweets" and then just stops. In verbose mode, I see the same 429 response from Twitter as others in this thread. By the time it stops it has downloaded about 140 or so media files. I left it overnight...

Then I saw the comments about thread count, and manually set the threads to 4 in _processing.py. After having done this, everything worked fine and all is hunky-dory.

Maybe set the threads thing to a low number by default? At least easier than trying to debug on a 64-way machine like the one I'm running this on, I presume.. :)

nemobis commented 1 year ago

Il 05/11/22 01:34, Eirik Øverby ha scritto:

Maybe set the threads thing to a low number by default?

Yes, I think this is a reasonable suggestion. I also still get stuck all the time, my workaround is to use timeout to kill the process and restart it.