twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.66k stars 2.72k forks source link

There are certain accounts that are unscrapable #14

Closed DonaldTsang closed 6 years ago

DonaldTsang commented 6 years ago

I tried https://twitter.com/kimdotcom to test the installation, and it certainly works.
However, https://twitter.com/a_fellow_white does not (flagged as"sensitive content")
When I try to search his tweets, unless I am logged in, none of his tweets will show up.

haccer commented 6 years ago

I guess the only way to mitigate this would be to let people log in.

grepsedawk commented 6 years ago

Yeah, which would require you to build a Twitter session to avoid hitting the Twitter api

On Thu, Feb 1, 2018, 5:07 AM Cody Zacharias notifications@github.com wrote:

I guess the only way to mitigate this would be to let people log in.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/haccer/tweep/issues/14#issuecomment-362233630, or mute the thread https://github.com/notifications/unsubscribe-auth/AGaPS15i3WNBxqg2CUlM02ZlbPn5zBkFks5tQZrSgaJpZM4R1XK- .

DonaldTsang commented 6 years ago

While we are on the subject of "logging in", following and follows scraping could be done as well.
Or is that "feature creep" for lack of a better word?

grepsedawk commented 6 years ago

I'm not sure that comment coincides with the purpose of this issue.

On Thu, Feb 1, 2018, 11:46 AM Donald Tsang notifications@github.com wrote:

While we are on the subject of "logging in", following and follows scraping could be done as well.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/haccer/tweep/issues/14#issuecomment-362346274, or mute the thread https://github.com/notifications/unsubscribe-auth/AGaPSzGQ9iiDaLpINQlcM_vR7eGSo2auks5tQfhWgaJpZM4R1XK- .

DonaldTsang commented 6 years ago

@pachonk in a sense they are tangentially related, both requiring a user account to access the data.
Or should I open another issue?

haccer commented 6 years ago

You can't access a person's followers or following via Twitter's search operators; I looked into scraping them a long time ago, the only way to do it involves some extensive selenium work which I don't want to do.

DonaldTsang commented 6 years ago

@haccer I would like to ask then, how does tweep work with scrolling through the searches?

haccer commented 6 years ago

It's not technically scrolling, Tweep just iterates Tweet-IDs.

haccer commented 6 years ago

Thank you for providing an example of this, Donald.

Currently, Tweep cannot scrape Tweets belonging to accounts that require you to be logged in for their Tweets to be visible from Twitter's search (i.e Private Twitters, and Twitter accounts restricted by Twitter for potentially containing sensitive content).

After careful thought, I've come to the conclusion that the only possible way to scrape Tweets belonging to these accounts would be to make authenticated requests. Unauthenticated, the only way to view these Tweets is by visiting the user's profile and clicking the link; however, this would not work with the program since you cannot iterate Tweets from the user's profile page like you can with Twitter's search.

I've already experimented with scraping directly from a user's profile. Any work that would involve using selenium to emulate a browser I won't do because that affects performance and usually requires a browser to be open.

Maybe I will introduce authentication in the future, but for now I don't really plan on adding it. Introducing authentication invites more performance issues and comes with potential authenticating issues that won't be mitigatable (i.e Twitter accounts locking, 2 factor authentication). I would probably do it via passing cookies.

Thank you for the many suggestions Donald.

I'm going to close this because this isn't an issue with the program.