iSarabjitDhiman / TweeterPy

TweeterPy is a python library to extract data from Twitter. TweeterPy API lets you scrape data from a user's profile like username, userid, bio, followers/followings list, profile media, tweets, etc.
MIT License
138 stars 20 forks source link

end_cursor value #7

Closed nballen-tx closed 1 year ago

nballen-tx commented 1 year ago

Hi,

In get_user_tweets fucntion, I see I can pass the end_cursor value to have it start from the a certain tweet.

I've tried to use the cursor_endpoint value from the function's returned data. but it doesn't work.

Something looks like this: 'cursor_endpoint': 'DAABCgABF1OKqJc__-gKAAIXRIgW-doQAAgAAwAAAAIAAA'

or I tried 'entryId': 'tweet-1680771569491271681' or 1680771569491271681

None of these seems to work.

Can you please advise what value should I use?

Thanks,

iSarabjitDhiman commented 1 year ago

Hey, Its strange, I just tested it, its working just fine for me. Here is how I used it :


from tweeterpy import TweeterPy
twitter = TweeterPy()
# load or log into a session

# get initial data
tweets = twitter.get_user_tweets("elonmusk")

# then I Interrupted it with Ctrl + C and grabbed the cursor_endpoint from tweets['cursor_endpoint'] and used it in the next request.
cursor = tweets['cursor_endpoint']

# resume where you left off
more_tweets = twitter.get_user_tweets("elonmusk",end_cursor=cursor)

# you can test it with the endcursor I got
# below was the actual cursor for me
# data = twitter.get_user_tweets("elonmusk",end_cursor="DAABCgABF1OTTe4__80KAAIXS0buydYgAQgAAwAAAAIAAA")
``

> 
nballen-tx commented 1 year ago

Thanks for taking a look.

Tried the same on my end.

twitter = TweeterPy()
twitter.load_session(session_file_path=session_path)

# get initial data
tweets = twitter.get_user_tweets("elonmusk",total=3)

# then I Interrupted it with Ctrl + C and grabbed the cursor_endpoint from tweets['cursor_endpoint'] and used it in the next request.
cursor = tweets['cursor_endpoint']

# resume where you left off
more_tweets = twitter.get_user_tweets("elonmusk",end_cursor=cursor,total=3)

The "tweets" are fine as they are the top 3 tweets, but "more_tweet" are a bit problematic as it returned some old 2009 tweets. instead of #4-6 tweets.

Thanks,

iSarabjitDhiman commented 1 year ago

Hey @nballen-tx I understand what you saying, the reason why there are some old tweets in there :

Twitter by default includes these profile conversations and promotional tweets.

This is what I usually do to deal with it :

If you take a look at your data, you will notice that each item has a "entryId" which either starts with "tweet-(some_unique_id)", "profile-conversation-(some_unique_id)" or "promoted-(some_unique_id)".

So I just iterate through the whole dataset and filter them by entryId, the one starts with "tweet-".

data = [item for item in tweets["data"] if item.get("entryId",None) and item["entryId"].startswith("tweet")]
nballen-tx commented 1 year ago

Thanks for the explanation!