n0madic / twitter-scraper

Scrape the Twitter frontend API without authentication with Golang.
MIT License
882 stars 179 forks source link

Scrapper not working #79

Closed l33t7here closed 1 year ago

l33t7here commented 1 year ago

Scrapper all modes are not working ..

lucasg1 commented 1 year ago

image

I'm trying many requests with different params and headers and only get "Forbidden" as response. If anyone knows a solution, please share...

lucasg1 commented 1 year ago

The search function is no more accessible if you are not logged in, that's probably where the error is.

codilau commented 1 year ago

Is there any way to programatically log-on to use the search latest? LE: assuming there is no browser on the machine to snatch cookies from

lidakaml commented 1 year ago

Started testing on how to bypass this without using selenium, since the original WithCookie and WithXCsrfToken didn't work for me.

If you log in from browser and copy the following cookies: [auth_token, _twitter_sess, ct0, gt, twid, lang, kdt] - (I don't know which ones are crucial)

and add them like this https://github.com/lidakaml/twitter-scraper-auth/blob/master/api.go#L43 it should work, worked for me. I also set the bearer from my browser.

Maybe someone could work out a more permanent and more readable solution, I hope this helps!

P.S. some of the cookies have life ranging from 4hrs to 24, maybe there is a way to automatically update them

lucasg1 commented 1 year ago

Started testing on how to bypass this without using selenium, since the original WithCookie and WithXCsrfToken didn't work for me.

It didn't work for me either...

If you log in from browser and copy the following cookies: [auth_token, _twitter_sess, ct0, gt, twid, lang, kdt] - (I don't know which ones are crucial)

The only ones that I had to set using a regular GET request were _authtoken and ct0. I also had to put 2 headers: authorization with the bearer value and the x-csrf-token which has the same value as the cookie ct0.

and add them like this https://github.com/lidakaml/twitter-scraper-auth/blob/master/api.go#L43 it should work, worked for me. I also set the bearer from my browser.

Maybe someone could work out a more permanent and more readable solution, I hope this helps!

P.S. some of the cookies have life ranging from 4hrs to 24, maybe there is a way to automatically update them

I'm new to Go, can you put here how you added them to the []*http.Cookie array? @lidakaml I had to do like this (which is working now, thank you!!):

    cookie := &http.Cookie{
        Name:  "auth_token",
        Value: "my_auth_token_cookie_value",
    }
    cookie2 := &http.Cookie{
        Name:  "ct0",
        Value: "my_ct0_cookie_value",
    }
    req.AddCookie(cookie)
    req.AddCookie(cookie2)

I also had to do: scraper.WithXCsrfToken("ct0_cookie_value") for the x-csrf-token be added to the header

lidakaml commented 1 year ago
import (
    "net/http"
    twitterscraper "github.com/lidakaml/twitter-scraper-auth"
)

....

    cookie1 := &http.Cookie{Name: "auth_token", Value: "my_auth_token_cookie_value"}
    twitterscraper.Cookies = append(twitterscraper.Cookies, cookie1)
    cookie2 := &http.Cookie{Name: "ct0", Value: "your_ct0_token"}
    twitterscraper.Cookies = append(twitterscraper.Cookies, cookie2)

this is how I did it but there is much cleaner ways to do it, this was just for testing on what works

n0madic commented 1 year ago

I've added full authentication flow with a password, but the search hasn't worked yet - twitter always returns a scroll: cursor...

lidakaml commented 1 year ago

Did it work with just the cookies? Login is a nice feature but I can see Twitter captcha being a problem. Usually, Facebook, etc. scrapers/managers rely on using cookies of an already logged-in session, but if you can bypass the captcha, this would be a nice feature.

Also, it looks like this could be the problem: https://github.com/n0madic/twitter-scraper/commit/6d13e319a39d942f0f4b9c1ba6ad0b89f328680a#r110220147

n0madic commented 1 year ago

now the problem is not in authorization, but in a full-fledged search

talesricr commented 1 year ago

scraper.Login("username","password") Works for me. But search mode latest looks like didnt change anything. It always return tweets from "Top" section. Someone know about this?

lucasg1 commented 1 year ago

Search is also working for me using scraper.Login

@talesricr the search mode "latest" wasn't working even before this authentication problem, it's mentioned here #75 but no one opened a issue especifically for that I guess

mahbubshaun commented 1 year ago

Started testing on how to bypass this without using selenium, since the original WithCookie and WithXCsrfToken didn't work for me.

If you log in from browser and copy the following cookies: [auth_token, _twitter_sess, ct0, gt, twid, lang, kdt] - (I don't know which ones are crucial)

and add them like this https://github.com/lidakaml/twitter-scraper-auth/blob/master/api.go#L43 it should work, worked for me. I also set the bearer from my browser.

Maybe someone could work out a more permanent and more readable solution, I hope this helps!

P.S. some of the cookies have life ranging from 4hrs to 24, maybe there is a way to automatically update them

Hey, mate. Would you please guide me how you were able to collect cookies from selenium browser programmatically? I see the driver does not return, ct0, auth , guest token

talesricr commented 1 year ago

Now I'm getting this even with authentication :(


  "tweets": [
    {
      "text": "",
      "username": "",
      "uuid": "",
      "likes": 0,
      "replies": 0,
      "retweets": 0,
      "timestamp": "0001-01-01T00:00:00Z"
    }
  ]
}```
stevestavropoulos commented 1 year ago

Now I'm getting this even with authentication :(

Maybe you are blocked by twitter. Try a normal login via browser and see what it tells you.

n0madic commented 1 year ago

check this fix