Open thewh1teagle opened 2 days ago
hi @thewh1teagle! saw you contributed UserTweetsAndReplies
thank you so much! i’ll add tests and merge it tomorow.
i suppose you already found out how to paginate.
this library haven’t yet implemented getting the current user from cookie. you can get screen_name
/ user_id
by making get request to https://api.twitter.com/1.1/account/multi/list.json
with empty body.
though if your cookie has multiple accounts logged in (have auth_multi
cookie) this will return data for all accounts without flaging which is currently active. to get currently active screen_name
you can make get request to https://api.twitter.com/1.1/account/settings.json
Thanks, now we have account endpoints for getting screen_name
:)
I tried to use FetchTweetsAndRepliesByUserID
by iterate it and sleep 10 seconds between each iteration
But got this error:
panic: response status 429 Too Many Requests: Rate limit exceeded
func run() {
creds, err := auth.GetCredentials()
if err != nil {
panic(err)
}
scraper := twitterscraper.New()
authToken := twitterscraper.AuthToken{Token: creds.AuthToken, CSRFToken: creds.Ct0}
scraper.SetAuthToken(authToken)
if !scraper.IsLoggedIn() {
panic("Invalid AuthToken")
}
settings, err := scraper.GetAccountSettings()
if err != nil {
panic(err)
}
log.Println("Logged in as: ", settings.ScreenName)
userId, err := scraper.GetUserIDByScreenName(settings.ScreenName)
if err != nil {
panic(err)
}
// Load cursors for posts and replies
cursorPosts, err := storage.LoadCursor(".cursor_posts")
log.Println("Current Cursor:", cursorPosts)
if err != nil {
log.Println("No cursor file found for posts, starting from the beginning.")
cursorPosts = ""
}
// Counter for the number of fetched tweets
fetchedCount := 0
// First loop to fetch and save posts
for {
// Fetch tweets using the cursor
tweets, newCursorPosts, err := scraper.FetchTweetsAndRepliesByUserID(userId, 20, cursorPosts)
if err != nil {
panic(err)
}
// If no new tweets are fetched, exit the loop
if len(tweets) == 0 {
log.Println("No new posts found. Exiting...")
break
}
// Increment the fetched count by the number of newly fetched tweets
fetchedCount += len(tweets)
log.Printf("Fetched %d new tweets. Total fetched: %d\n", len(tweets), fetchedCount)
// Save the new cursor state for posts
if err := storage.SaveCursor(".cursor_posts", newCursorPosts); err != nil {
panic(err)
}
// Save each tweet in JSONL format
if err := storage.SaveTweetJSONL("posts.jsonl", tweets); err != nil {
panic(err)
}
// Update cursor for the next iteration
cursorPosts = newCursorPosts
// Optional: Delay to avoid hitting rate limits
time.Sleep(sleepBetweenRequests)
}
log.Printf("Total tweets fetched: %d\n", fetchedCount)
}
The default sleepBetweenRequest
is 10*time.Second`
Did I make the requests too quickly? How many tweets it takes by default? I noticed that in twitter UI it takes 20 at each scroll.
@thewh1teagle i was doing the same task recently and 15 seconds delay was enough. each request usually return 20 tweets, but sometimes can do 15-90. this lib has implemented method scraper.WithDelay(15)
which you can use instead your sleepBetweenReques
@imperatrona
Thanks! good to hear that you did it recently, though note that I'm using the new endpoint from https://github.com/imperatrona/twitter-scraper/pull/20. It's almost the same like the getTweets except that it returns basically everything that the user posted - tweets / replies / reposts / quotes etc.
I changed it to use withDelay
instead of sleeping and increased the timeout. I'll check later. hope it will works without this error.
Hey!
Thanks for creating such a great library!
I'm trying to retrieve all of my tweets and replies (I have thousands), but I couldn't find any mention of pagination to fetch beyond the maximum limit. Does the library support this feature?
Also, I don't see an option to get my own username or user ID after authentication. Could you clarify how to achieve that?