imperatrona / twitter-scraper

Scrape the Twitter frontend API without authentication with Golang.
MIT License
56 stars 7 forks source link

Auth error (366): flow name LoginFlow is currently not accessible #14

Open jonathanrstern opened 1 month ago

jonathanrstern commented 1 month ago

Running into this error on login() ... thoughts?

Auth error (366): flow name LoginFlow is currently not accessible

It happens right after the Account duplication check:

data = {
        "flow_token": flow_token,
        "subtask_inputs": [
            {
                "subtask_id": "AccountDuplicationCheck",
                "check_logged_in_account": {"link": "AccountDuplicationCheck_false"},
            }
        ],
    }

flow_token = get_flow_token(scraper, data)

def get_flow_token(scraper: Dict[str, Any], data: Dict[str, Any]) -> str:
    info = get_flow(scraper, data)
    if "errors" in info and len(info["errors"]) > 0:
        error = info["errors"][0]
        raise Exception(f"Auth error ({error.get('code', 'unknown')}): {error.get('message', 'Unknown error')}")
    return info.get("flow_token")

(I'm using python)

jonathanrstern commented 1 month ago

Changed my User-Agent to Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 and it worked!

One question @imperatrona - let's say I'm using the library's search feature... do you recommend I log in every time I want to do a new search?

At what point will Twitter flag my account for too many logins?

imperatrona commented 1 month ago

@jonathanrstern you shouldn’t log in often. after you log in you should save cookies the twitter returns in set-cookie header and reuse them as long as they valid instead of log in every time.

jonathanrstern commented 1 month ago

Got it. That’s what I sort of thought, but glad to have confirmation. Thank you @imperatrona :)

Two follow up Qs:

  1. I’ve noticed User-Agent making a pretty big difference regarding whether I’m able to log in successfully. For example, TwitterAndroid/99 is not working but regular Mozilla Chrome is. Have you observed Twitter/X updating what is acceptable for user agents?

  2. For a single user, how much scraping is too much? i.e, at what point does one risk suspension? Is 1 search per minute on the low end or high end?

imperatrona commented 1 month ago

@jonathanrstern sorry for not getting back to you sooner!

  1. from my experience for login i use user-agent i’ve previously used with this account. and as it’s using web api you should use browsers user-agents
  2. 1 per minute on the low end, mine works with 1 per 5 seconds. search rate limited at 150 requests per 15 minutes. i recommend make less than 150 if you want your accounts to live longer. also theres may be other limits, so if you got error you should stop and wait for 15 minutes at least
jonathanrstern commented 1 month ago

Thanks @imperatrona

  1. "you should use browsers user-agents" -- that's in line with my experience as well. Thanks for confirming.

  2. I'm using the SearchTimeline feature. A few questions:

Appreciate all the help :)

imperatrona commented 1 month ago

@jonathanrstern

  1. Some endpoints are little buggy:) For example SearchTimeline for diffrent users can return different results with same request. Or GetTweetRetweeters can sometimes returns empty results with cursor only.

  2. Tweets count doesn’t rate limited, only requests count. So if your one request returns 300+ tweets its still counts as one request.

  3. I hadn’t encounter any restrictions for a new accounts. Though couple of my new accounts was suspended, but it was 3 pieces from 300 accounts batch. But anyway be safe, if your account got rate limit error wait enough time before sending requests.

Happy to help!

wade-liwei commented 3 weeks ago

@imperatrona

I am back. some API is too expensive. So we want to use your project to make a single API request.

wade-liwei commented 3 weeks ago

response status 403 Forbidden: {"errors":[{"code":64,"message":"Your account is suspended and is not permitted to access this feature."}]}

How many source public IP addresses do you use? I guess there is IP restrictions.

imperatrona commented 3 weeks ago

@wade-liwei i have not encountered any ip restrictions with hundreds of accounts on one ip. i’m using single ip. you should try make less requests than rate limit allows, for example ~100 instead of 150 requests per 15 minute

wade-liwei commented 3 weeks ago

Could you please share the request rate for one account?

Right now, I am testing 1 request per 10 minutes per account.

~100 instead of 150 requests per 15 minutes, for 20 accounts?

imperatrona commented 3 weeks ago

@wade-liwei rate liming is per endpoint, not per account. in readme.md i have rate limits for each endpoints

wade-liwei commented 3 weeks ago
image

50 requests / 15 minutes = 1 request / 18 seconds

How many accounts should I use?

twitter always locked my account. So I think I should find out the locked account reason.

imperatrona commented 3 weeks ago

@wade-liwei it's depends, how many data you need to get?

wade-liwei commented 3 weeks ago

a little data, we just need to get the new follower.

not very big data, but I need the real time data(delay 5 minutes).

imperatrona commented 3 weeks ago

@wade-liwei

  1. use publicly available GetProfileByID to check if followers count changed. it doesn't require auth, and after reaching limit you can generate new guest session with GetGuestToken. then if followers count changed you can use authorized client to fetch new followers. this way you'll avoid unnecessary FetchFollowers requests. for public method i still recomend to use some adequate delay, or your ip may get banned.
  2. as i said before avoid reaching limits - FetchFollowers is limited for 50 requests, but you should stop somewhere between 30-40. constantly reaching limits will lead to account suspension.
  3. amount of account you will need to scrape will depends on how fast target accounts gain followers and how many of them you'll need to track. if we take rauchg account for example he have 200k followers and gain 100-300 followers every day. one request can return up to 20 followers. so in case of maximum efficency you will need 15 requests per day to get his new followers, but in reality you'll need a lot more. with one scraping account you should be able to scrape new followers from 10-30 accounts as rauchg, you'll be able to scrape more accounts if they have less followers.
wade-liwei commented 3 weeks ago

Thank you. I will implement it tomorrow. Right now, sleeping time is coming in China.

wade-liwei commented 3 weeks ago

@imperatrona good solution.

RIght now, If my account can not login (DenyLoginSubtask), I replace the username and password login with cookies(from the browser and can not logout on browser (expirationDate: a year later) :).

wade-liwei commented 3 weeks ago

@imperatrona

You said: one request can return up to 20 followers. I see the parameter(maxUsersNbr) is 20 too.

But I can get 70 or 69 followers response every request. Available followers are 20?

imperatrona commented 3 weeks ago

@wade-liwei when making requests we set count to 20 as twitter web client do, but often server returns more, sometimes it can return 0 tho, its just the way the twitter coded now 🤡