mahrtayyab / tweety

Twitter Scraper
494 stars 67 forks source link

Uncaught exception fetching locked tweet #180

Closed half-duplex closed 7 months ago

half-duplex commented 7 months ago

Attempting to load a tweet from a locked account (that the bot account doesn't have access to) causes an uncaught exception.

>>> from tweety import Twitter
>>> app = Twitter("test")
>>> app.sign_in(TWITTER_USER, TWITTER_PASSWORD)
>>> tweet = app.tweet_detail(LOCKED_TWEET_LINK)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "env/lib/python3.11/site-packages/tweety/bot.py", line 820, in tweet_detail
    tweet = Tweet(self, entry, response)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "env/lib/python3.11/site-packages/tweety/types/twDataTypes.py", line 42, in new_init
    init(self, *_args, **_kwargs) # noqa
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "env/lib/python3.11/site-packages/tweety/types/twDataTypes.py", line 166, in __init__
    self._format_tweet()
  File "env/lib/python3.11/site-packages/tweety/types/twDataTypes.py", line 261, in _format_tweet
    if self._tweet.get('tweet'):
       ^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'

This happens reliably with a fresh login, while it works as expected (raises ProtectedTweet) when logged out or when the session has been loaded from the tw_session file.

Tested with tweety 1.1.2 and a7e5be39b75e14b81c9093bffd647f011cb73f35

Problem _raw:

{'entryId': 'tweet-XXXXX', 'sortIndex': 'XXXXX', 'content': {'entryType': 'TimelineTimelineItem', '__typename': 'TimelineTimelineItem', 'itemContent': {'itemType': 'TimelineTweet', '__typename': 'TimelineTweet', 'tweet_results': {'result': {'__typename': 'TweetTombstone', 'tombstone': {'__typename': 'TextTombstone', 'text': {'rtl': False, 'text': 'You’re unable to view this Post because this account owner limits who can view their Posts. Learn more', 'entities': [{'fromIndex': 92, 'toIndex': 102, 'ref': {'type': 'TimelineUrl', 'url': 'https://help.twitter.com/rules-and-policies/notices-on-twitter', 'urlType': 'ExternalUrl'}}]}}}}, 'tweetDisplayType': 'Tweet', 'hasModeratedReplies': False}}}

Sopel issue: https://github.com/sopel-irc/sopel-twitter/issues/62

half-duplex commented 7 months ago

Actually, it looks like the session isn't being loaded properly. From a fresh workspace (rm test.tw_session):

>>> from tweety import Twitter
>>> app = Twitter("test")
>>> app.session.logged_in
False
>>> tweet = app.tweet_detail(LOCKED_TWEET_LINK)
[...]
tweety.exceptions_.ProtectedTweet: Tweet is private/protected
>>> app.sign_in(TWITTER_USER, TWITTER_PASSWORD)
>>> tweet = app.tweet_detail(LOCKED_TWEET_LINK)
[...]
AttributeError: 'NoneType' object has no attribute 'get'

I exit the REPL, and from the tweety account (TWITTER_USER) I request to follow the locked account, then from the locked account I accept.

>>> from tweety import Twitter
>>> app = Twitter("test")
>>> app.session.logged_in
True
>>> tweet = app.tweet_detail(LOCKED_TWEET_LINK) # should succeed
[...]
tweety.exceptions_.ProtectedTweet: Tweet is private/protected
>>> app.sign_in(TWITTER_USER, TWITTER_PASSWORD)
>>> tweet = app.tweet_detail(LOCKED_TWEET_LINK)
>>> 

So the loaded session has logged_in=True but can't load the locked tweet until I re-auth with user/pass. Not sure if this is user error or library weirdness.

dgw commented 7 months ago

So the loaded session has logged_in=True but can't load the locked tweet until I re-auth with user/pass. Not sure if this is user error or library weirdness.

You are supposed to call .connect() on the tweety instance after it loads the session, but I think it could be both that (user error) and library weirdness. A loaded session should .connect() itself, ideally, and also not report logged_in = True before it's fully set up.

The wrinkle here is that the plugin code already does .connect() but still gets the AttributeError instead of the expected ProtectedTweet.

half-duplex commented 7 months ago

Aha, thanks. After app.connect()ing with a saved session in REPL it has the same AttributeError.

mahrtayyab commented 7 months ago

Previously UserProtected was raised when UserUnavailable was found in response objext, Twitter recent update has removed it , it now contains empty object, the issue will be addressed in next release.

mahrtayyab commented 7 months ago

Actually, it looks like the session isn't being loaded properly. From a fresh workspace (rm test.tw_session):

>>> from tweety import Twitter
>>> app = Twitter("test")
>>> app.session.logged_in
False
>>> tweet = app.tweet_detail(LOCKED_TWEET_LINK)
[...]
tweety.exceptions_.ProtectedTweet: Tweet is private/protected
>>> app.sign_in(TWITTER_USER, TWITTER_PASSWORD)
>>> tweet = app.tweet_detail(LOCKED_TWEET_LINK)
[...]
AttributeError: 'NoneType' object has no attribute 'get'

I exit the REPL, and from the tweety account (TWITTER_USER) I request to follow the locked account, then from the locked account I accept.

>>> from tweety import Twitter
>>> app = Twitter("test")
>>> app.session.logged_in
True
>>> tweet = app.tweet_detail(LOCKED_TWEET_LINK) # should succeed
[...]
tweety.exceptions_.ProtectedTweet: Tweet is private/protected
>>> app.sign_in(TWITTER_USER, TWITTER_PASSWORD)
>>> tweet = app.tweet_detail(LOCKED_TWEET_LINK)
>>> 

So the loaded session has logged_in=True but can't load the locked tweet until I re-auth with user/pass. Not sure if this is user error or library weirdness.

app.session.logged_in doesn't really means the user is logged in , is it internal attribute solely used by library to determine either we have loaded the session or not. to confirm you are logged in successfully use app._is_connected or simply call app.me , it will return logged in user.

mahrtayyab commented 7 months ago

I will add proper is_user_authorized attribute to determine either we are logged in or not.

mahrtayyab commented 7 months ago

is_user_authorized Added