taspinar / twitterscraper

Scrape Twitter for Tweets
MIT License
2.41k stars 578 forks source link

0 tweets #296

Open etemiz opened 4 years ago

etemiz commented 4 years ago

INFO: Retrying... (Attempts left: 1) INFO: Scraping tweets from https://twitter.com/search?f=tweets&vertical=default&q=bitcoin&l= INFO: Using proxy 181.211.38.62:47911 INFO: Got 0 tweets for bitcoin.

Parsing may be an issue. Both twitterscraper (0.9.3) and (1.4.0) are failing.

mickyscreggs commented 4 years ago

Have also been facing this issue. Queries that were returning tweets yesterday are not returning tweets today.

ravishankarramakrishnan commented 4 years ago

I'm also facing the Same issue! Yesterday it was parsing well, but today it returns 0 tweets

xtr32 commented 4 years ago

same here 0 tweets

yiw0104 commented 4 years ago

same here 0 tweets

tengfei7890 commented 4 years ago

Seems Twitter has restricted the connection so that all requests return a page with "We've detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter?"

panoptikum commented 4 years ago

+1. That's bad.

hakanyusufoglu commented 4 years ago

INFO: Retrying... (Attempts left: 1) INFO: Scraping tweets from https://twitter.com/search?f=tweets&vertical=default&q=bitcoin&l= INFO: Using proxy 181.211.38.62:47911 INFO: Got 0 tweets for bitcoin.

Parsing may be an issue. Both twitterscraper (0.9.3) and (1.4.0) are failing.

hocam bende bir proje geliştirmiştim projemde ana kısım buna bağlı bu sorunu nasıl düzeltebiliriz

hakanyusufoglu commented 4 years ago

I need help

toscanopedro commented 4 years ago

same here... anyone has a clue for whats going on?

hakanyusufoglu commented 4 years ago

Not yet. I used it for school university project. What will I do during the presentation

rubengoeminne commented 4 years ago

Seems Twitter has restricted the connection so that all requests return a page with "We've detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter?"

Indeed, this can be fixed by modifying the header dictionary in query.py from HEADER = {'User-Agent': random.choice(HEADERS_LIST)} to HEADER = {'User-Agent': random.choice(HEADERS_LIST), 'X-Requested-With': 'XMLHttpRequest'} that should fix the issue.

locchipinti commented 4 years ago

Seems Twitter has restricted the connection so that all requests return a page with "We've detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter?"

Indeed, this can be fixed by modifying the header dictionary in query.py from HEADER = {'User-Agent': random.choice(HEADERS_LIST)} to HEADER = {'User-Agent': random.choice(HEADERS_LIST), 'X-Requested-With': 'XMLHttpRequest'} that should fix the issue.

It works for me! Thanks @rubengoeminne, genius!

hakanyusufoglu commented 4 years ago

Thanks

Seems Twitter has restricted the connection so that all requests return a page with "We've detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter?"

Indeed, this can be fixed by modifying the header dictionary in query.py from HEADER = {'User-Agent': random.choice(HEADERS_LIST)} to HEADER = {'User-Agent': random.choice(HEADERS_LIST), 'X-Requested-With': 'XMLHttpRequest'} that should fix the issue.

I am very thank you. its work.

xtr32 commented 4 years ago

Seems Twitter has restricted the connection so that all requests return a page with "We've detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter?"

Indeed, this can be fixed by modifying the header dictionary in query.py from HEADER = {'User-Agent': random.choice(HEADERS_LIST)} to HEADER = {'User-Agent': random.choice(HEADERS_LIST), 'X-Requested-With': 'XMLHttpRequest'} that should fix the issue.

its work.. thanks

toscanopedro commented 4 years ago

hi guys, im a kind of noob and do not have a HEADER in my code... someone can tell how can i implement it?

GivenToFlyCoder commented 4 years ago

Seems Twitter has restricted the connection so that all requests return a page with "We've detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter?"

Indeed, this can be fixed by modifying the header dictionary in query.py from HEADER = {'User-Agent': random.choice(HEADERS_LIST)} to HEADER = {'User-Agent': random.choice(HEADERS_LIST), 'X-Requested-With': 'XMLHttpRequest'} that should fix the issue.

Thanks a lot my friend! This worked for me! You are a genius! Let me share you a beer @rubengoeminne! Paulaner German Beer? Or Negra Modelo Mexican Beer?

GivenToFlyCoder commented 4 years ago

hi guys, im a kind of noob and do not have a HEADER in my code... someone can tell how can i implement it?

@toscanopedro The header dictionary: HEADER = {'User-Agent': random.choice(HEADERS_LIST)} is not in your own code, instead it is a line inside the file query.py

Just open the file as TXT, and change the lines, such as @rubengoeminne said. You could search the file in your PC, maybe it will be foun at the path: C:\ProgramData\Anaconda3\Lib\site-packages\twitterscraper

toscanopedro commented 4 years ago

hi guys, im a kind of noob and do not have a HEADER in my code... someone can tell how can i implement it?

@toscanopedro The header dictionary: HEADER = {'User-Agent': random.choice(HEADERS_LIST)} is not in your own code, instead it is a line inside the file query.py

Just open the file as TXT, and change the lines, such as @rubengoeminne said. You could search the file in your PC, maybe it will be foun at the path: C:\ProgramData\Anaconda3\Lib\site-packages\twitterscraper

THX MAN!!!!

yiw0104 commented 4 years ago

Seems Twitter has restricted the connection so that all requests return a page with "We've detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter?"

Indeed, this can be fixed by modifying the header dictionary in query.py from HEADER = {'User-Agent': random.choice(HEADERS_LIST)} to HEADER = {'User-Agent': random.choice(HEADERS_LIST), 'X-Requested-With': 'XMLHttpRequest'} that should fix the issue.

The modification no longer works for query_user_info. I changed the header dictionary in query.py and still got no information on my list of users.

AlexBietrix commented 4 years ago

I faced the same issue. It seems to work now to retrieve the tweets. However I get this error when I want to have user info, using query_user_info : local variable 'user_info' referenced before assignment

mardiaz353 commented 4 years ago

Yah it is not working for me. Changed that line in query.py and same issue occurs.

wal-iston commented 4 years ago

Hi. I have implemented the modification suggested by pumpkinw and the algortihm made progress. It was not scraping anything before modification. But after modification it is scraping, but not everything. It seems it is scraping only some last hours. For example, when I issued:

twitterscraper fascismo --lang pt -p 1 -bd 2020-05-31 -ed 2020-06-01 -o file_name.json

I received tweets corresponding only to hours from 20 up to 23 of day 2020-05-31:

In [12]: df.groupby(df['timestamp'].dt.hour).count()


Out[12]: has_media hashtags img_urls is_replied ... tweet_url user_id username video_url timestamp ...
20 956 956 956 956 ... 956 956 956 956 21 2384 2384 2384 2384 ... 2384 2384 2384 2384 22 2100 2100 2100 2100 ... 2100 2100 2100 2100 23 2147 2147 2147 2147 ... 2147 2147 2147 2147

[4 rows x 21 columns]


Somebody know what is going on?

Frickson commented 4 years ago

already changed the header from HEADER = {'User-Agent': random.choice(HEADERS_LIST)} to HEADER = {'User-Agent': random.choice(HEADERS_LIST), 'X-Requested-With': 'XMLHttpRequest'} but still have the same issue 'NoneType' object has no attribute 'user'.

javad94 commented 4 years ago

I don't like modifying module's files directly, so instead of that and based on @rubengoeminne's great answer, to fix this issue you just have to add these line of codes to the top of your python script:

import twitterscraper
import random
HEADERS_LIST = [
    'Mozilla/5.0 (Windows; U; Windows NT 6.1; x64; fr; rv:1.9.2.13) Gecko/20101203 Firebird/3.6.13',
    'Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko',
    'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201',
    'Opera/9.80 (X11; Linux i686; Ubuntu/14.10) Presto/2.12.388 Version/12.16',
    'Mozilla/5.0 (Windows NT 5.2; RW; rv:7.0a1) Gecko/20091211 SeaMonkey/9.23a1pre'
]
twitterscraper.query.HEADER = {'User-Agent': random.choice(HEADERS_LIST), 'X-Requested-With': 'XMLHttpRequest'}

And do your stuff normally:

from twitterscraper import query_tweets
query_tweets("github", 100)
Marlowe97 commented 4 years ago

Seems Twitter has restricted the connection so that all requests return a page with "We've detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter?"

Indeed, this can be fixed by modifying the header dictionary in query.py from HEADER = {'User-Agent': random.choice(HEADERS_LIST)} to HEADER = {'User-Agent': random.choice(HEADERS_LIST), 'X-Requested-With': 'XMLHttpRequest'} that should fix the issue.

This solution seems not to work for me now.

javad94 commented 4 years ago

Seems Twitter has restricted the connection so that all requests return a page with "We've detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter?"

Indeed, this can be fixed by modifying the header dictionary in query.py from HEADER = {'User-Agent': random.choice(HEADERS_LIST)} to HEADER = {'User-Agent': random.choice(HEADERS_LIST), 'X-Requested-With': 'XMLHttpRequest'} that should fix the issue.

This solution seems not to work for me now.

Yeah, unfortunately they close it down.

toscanopedro commented 4 years ago

guys are you sure that you replace the correct arquive? this is still working for me

AlexBietrix commented 4 years ago

@toscanopedro where did you replace please ? I replaced in the query.py file, and it's not working... Thanks !

toscanopedro commented 4 years ago

first you have to pip show twitterscraper to discover the locaticon of the twitterscraper directory. the mine was in: "c:\users\pedro\appdata\local\programs\python\python38-32\lib\site-packages" there is a folder called twitterscraper and the query.py arquive. And you have just to change it. the path may change, depends what idle you are using. but he is awais inside a "lib\site-packages"

AlexBietrix commented 4 years ago

@toscanopedro I am working on GCP, I changed the arquive manually as shown on the picture. Is is sufficient ? Capture d’écran 2020-06-03 à 21 14 04

Thanks

AllanSCosta commented 4 years ago

@toscanopedro it doesn't work on my end, unfortunately. I would imagine that you're making your requests to some server that wasn't updated yet, maybe? I'll play with VPNs and check

toscanopedro commented 4 years ago

@toscanopedro it doesn't work on my end, unfortunately. I would imagine that you're making your requests to some server that wasn't updated yet, maybe? I'll play with VPNs and check

Yes...this is what happen... The problem is back here... my code do not work anymore... its very sad..

hakanyusufoglu commented 4 years ago

It doesn't work for me anymore. How to fix?

hakanyusufoglu commented 4 years ago

@toscanopedro it doesn't work on my end, unfortunately. I would imagine that you're making your requests to some server that wasn't updated yet, maybe? I'll play with VPNs and check

Yes...this is what happen... The problem is back here... my code do not work anymore... its very sad..

What will we do? the whole project depends on it...

toscanopedro commented 4 years ago

@toscanopedro it doesn't work on my end, unfortunately. I would imagine that you're making your requests to some server that wasn't updated yet, maybe? I'll play with VPNs and check

Yes...this is what happen... The problem is back here... my code do not work anymore... its very sad..

What will we do? the whole project depends on it...

I dont know... my project depends on it to

romit-actuarial commented 4 years ago

Seems Twitter has restricted the connection so that all requests return a page with "We've detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter?"

Indeed, this can be fixed by modifying the header dictionary in query.py from HEADER = {'User-Agent': random.choice(HEADERS_LIST)} to HEADER = {'User-Agent': random.choice(HEADERS_LIST), 'X-Requested-With': 'XMLHttpRequest'} that should fix the issue.

Hi everyone This was working earlier, but it stopped working today - I think twitter is also following this thread :P

Help me please - like everyone above my project also depends on it!

Marlowe97 commented 4 years ago

It seems that Twitter has had it enough! The company is shutting down its original site legacy theme version on the 1st of June 2020. Twitter has issued a warning to all the users who have been using user-agent switching hacks and unsupported browsers to enable the legacy theme. Since this package is based on the legacy theme and user-agent, I am not sure whether there exists a one-line solution.

toscanopedro commented 4 years ago

It seems that Twitter has had it enough! The company is shutting down its original site legacy theme version on the 1st of June 2020. Twitter has issued a warning to all the users who have been using user-agent switching hacks and unsupported browsers to enable the legacy theme. Since this package is based on the legacy theme and user-agent, I am not sure whether there exists a one-line solution.

yes, its true... shit... I think this will take months

hakanyusufoglu commented 4 years ago

I was using this library because my Twitter api application was rejected. Now it's too late for everything. I hope it gets better soon.

sagefuentes commented 4 years ago

I have no idea how useful this is, but I know that Get Old Tweets 3 is still largely working, as of June 4th at 5:18 p.m. PST. It does not have a way to grab video or images (which is why I am interested in twitterscraper). Hopefully this provides use for someone either if they don't need images and videos, or if someone can backwards engineer a solution (I am trying to figure it out, but my chops are not to that level yet).

Frickson commented 4 years ago

Is there anyway GetOldTweet can retrieve the total number of retweeted post from the specific user? @sagefuentes

lapp0 commented 4 years ago

Please try this and let me know how it works for you https://github.com/taspinar/twitterscraper/pull/302

datablogger-ml commented 4 years ago

It was not working yesterday even after I changed the HEADER in the query.py But Today all of a sudden its working :)

hakanyusufoglu commented 4 years ago

It was not working yesterday even after I changed the HEADER in the query.py But Today all of a sudden its working :)

Yes. very interesting :)

toscanopedro commented 4 years ago

same here, guys. its working again... lets see tomorrow

Kai292-tech commented 4 years ago

It works now.

Frickson commented 4 years ago

I changed the header in query.py but raise the error "AttributeError: 'NoneType' object has no attribute 'user'", anyone help please..

from twitterscraper.query import query_user_info import pandas as pd from multiprocessing import Pool import time from IPython.display import display

global twitter_user_info twitter_user_info=[]

def get_user_info(twitter_user): """ An example of using the query_user_info method :param twitter_user: the twitter user to capture user data :return: twitter_user_data: returns a dictionary of twitter user data """ user_info = query_user_info(user= twitter_user) twitter_user_data = {} twitter_user_data["user"] = user_info.user twitter_user_data["fullname"] = user_info.full_name twitter_user_data["location"] = user_info.location twitter_user_data["blog"] = user_info.blog twitter_user_data["date_joined"] = user_info.date_joined twitter_user_data["id"] = user_info.id twitter_user_data["num_tweets"] = user_info.tweets twitter_user_data["following"] = user_info.following twitter_user_data["followers"] = user_info.followers twitter_user_data["likes"] = user_info.likes twitter_user_data["lists"] = user_info.lists

return twitter_user_data

def main(): start = time.time() users = ['Carlos_F_Enguix', 'mmtung', 'dremio', 'MongoDB', 'JenWike', 'timberners_lee','ataspinar2', 'realDonaldTrump', 'BarackObama', 'elonmusk', 'BillGates', 'BillClinton','katyperry','KimKardashian']

pool = Pool(8)    
for user in pool.map(get_user_info,users):
    twitter_user_info.append(user)

cols=['id','fullname','date_joined','location','blog', 'num_tweets','following','followers','likes','lists']
data_frame = pd.DataFrame(twitter_user_info, index=users, columns=cols)
data_frame.index.name = "Users"
data_frame.sort_values(by="followers", ascending=False, inplace=True, kind='quicksort', na_position='last')
elapsed = time.time() - start
print(f"Elapsed time: {elapsed}")
display(data_frame)

if name == 'main': main()

javad94 commented 4 years ago

Great, it's working again. But don't hold your breath on this. Find another alternative before it's too late.

lapp0 commented 4 years ago

Leaving this open because it appears to be on-and-off working. I'll update #302 so js is optional because legacy appears to work sometimes

lapp0 commented 4 years ago

Merged in https://github.com/taspinar/twitterscraper/pull/304 origin/master should work now. Please create a new thread if this issue comes up again.