taspinar / twitterscraper

Scrape Twitter for Tweets
MIT License
2.39k stars 579 forks source link

0 tweets #296

Open etemiz opened 4 years ago

etemiz commented 4 years ago

INFO: Retrying... (Attempts left: 1) INFO: Scraping tweets from https://twitter.com/search?f=tweets&vertical=default&q=bitcoin&l= INFO: Using proxy 181.211.38.62:47911 INFO: Got 0 tweets for bitcoin.

Parsing may be an issue. Both twitterscraper (0.9.3) and (1.4.0) are failing.

GivenToFlyCoder commented 4 years ago

In this moment, TS is working again!

New

lapp0 commented 4 years ago

https://github.com/taspinar/twitterscraper/pull/304#issuecomment-639691821

Frickson commented 4 years ago

cmd>twitterscraper jack -l 50 --user -o jack.json still return me this errors, why... need help please INFO: Retrying... (Attempts left: 1) INFO: Scraping tweets from https://twitter.com/jack INFO: Using proxy 128.199.202.122:3128 INFO: Got 0 tweets from username jack

python version : 3.7.7 twitterscraper version: 1.4.0

lapp0 commented 4 years ago

@Frickson Cannot reproduce with python 3.7.6

Could you paste the beginning of your output before all the retries? mine is.

twitterscraper jack -l 50 --user -o jacka.json
INFO: {'User-Agent': 'Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko', 'X-Requested-With': 'XMLHttpRequest'}
INFO: Scraping tweets from https://twitter.com/jack
INFO: Using proxy 185.85.219.74:61068
INFO: Scraping tweets from https://twitter.com/i/profiles/show/jack/timeline/tweets?include_available_features=1&include_entities=1&max_position=1268318672319401989&reset_error_state=false
INFO: Using proxy 194.116.162.188:3128
INFO: Scraping tweets from https://twitter.com/i/profiles/show/jack/timeline/tweets?include_available_features=1&include_entities=1&max_position=1268234058263232514&reset_error_state=false
INFO: Using proxy 84.53.247.204:53281
INFO: Scraping tweets from https://twitter.com/i/profiles/show/jack/timeline/tweets?include_available_features=1&include_entities=1&max_position=1267617140292509696&reset_error_state=false
INFO: Using proxy 103.105.77.19:8080
INFO: Got 54 tweets from username jack

Could you also run pip3 freeze and share the results?

Frickson commented 4 years ago

twitterscraper jack -l 50 --user -o jack.json INFO: {'User-Agent': 'Opera/9.80 (X11; Linux i686; Ubuntu/14.10) Presto/2.12.388 Version/12.16'} INFO: Scraping tweets from https://twitter.com/jack INFO: Using proxy 102.130.133.102:53281 INFO: Retrying... (Attempts left: 50) pip3 freeze chardet==3.0.4 Click==7.0 coala-utils==0.5.1 colorama==0.4.3 configparser==5.0.0 decorator==4.4.2 Django==3.0.7 et-xmlfile==1.0.1 idna==2.9 ipython==7.13.0 ipython-genutils==0.2.0 jdcal==1.4.1 jedi==0.16.0 lxml==4.5.0 numpy==1.18.2 oauthlib==3.1.0 openpyxl==3.0.3 pandas==1.0.3 parso==0.6.2 pickleshare==0.7.5 prompt-toolkit==3.0.5 pycodestyle==2.5.0 Pygments==2.6.1 PySocks==1.7.1 python-dateutil==2.8.1 pytz==2019.3 requests==2.23.0 requests-oauthlib==1.3.0 six==1.14.0 soupsieve==2.0 sqlparse==0.3.1 toml==0.10.0 traitlets==4.3.3 TwitterAPI==2.5.10 twitterscraper==1.4.0 urllib3==1.25.8 wcwidth==0.1.9

lapp0 commented 4 years ago

@Frickson could you try with a fresh virtualenv?

python3 -m venv .venv
source .venv/bin/activate
python3 setup.py install
twitterscraper jack -l 50 --user -o jack.json

I'm not sure whether one of your additional packages is causing an issue.

If that doesn't work, I recommend changing https://github.com/taspinar/twitterscraper/blob/master/twitterscraper/query.py#L116 to say

except Exception as e:
    logger.exception(e)

(rather than just pass) then share the output

Frickson commented 4 years ago

image I followed all the steps but still same:C

lapp0 commented 4 years ago

Sorry, but I cannot reproduce. By nature of the proxies, I wouldn't expect it to work for me and fail for you...

You could try https://github.com/taspinar/twitterscraper/pull/302 which uses selenium to fetch tweets.

Frickson commented 4 years ago

@lapp0, I already found the error. Thanks you so much.

lapp0 commented 4 years ago

@Frickson could you share the issue and your fix in case others come across this problem?

lapp0 commented 4 years ago

Just an update for anyone coming across this issue: We don't have the recent #304 PR onto pypi yet (@taspinar could you help us out here when you get a chance?).

However, you can resolve this issue by running

git clone https://github.com/taspinar/twitterscraper.git
cd twitterscraper

# create virtualenv (optional)
python -m venv .venv
source .venv/bin/activate

# install and run
python3 setup.py install
twitterscraper --user "realDonaldTrump"  --output trump.json
johnnycho0127 commented 4 years ago

same here 0 tweets

me too, same 0 tweets

lapp0 commented 4 years ago

@yeonheecho Have you followed these instructions? https://github.com/taspinar/twitterscraper/issues/296#issuecomment-641084107

What command are you running?

yousuffarhan commented 4 years ago

Seems Twitter has restricted the connection so that all requests return a page with "We've detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter?"

Indeed, this can be fixed by modifying the header dictionary in query.py from HEADER = {'User-Agent': random.choice(HEADERS_LIST)} to HEADER = {'User-Agent': random.choice(HEADERS_LIST), 'X-Requested-With': 'XMLHttpRequest'} that should fix the issue.

Worked for me. Thank you.

zhicheng0501 commented 4 years ago

Just an update for anyone coming across this issue: We don't have the recent #304 PR onto pypi yet (@taspinar could you help us out here when you get a chance?).

However, you can resolve this issue by running

git clone https://github.com/taspinar/twitterscraper.git
cd twitterscraper

# create virtualenv (optional)
python -m venv .venv
source .venv/bin/activate

# install and run
python3 setup.py install
twitterscraper --user "realDonaldTrump"  --output trump.json

it worked fine days ago. since about 10 days ago, it still failed to pull data. it returns retrying all the time.

anushkmittal commented 4 years ago

getting 0 tweets as well

lapp0 commented 4 years ago

@zhicheng0501 @anushkmittal please try the -j option with after checking out the brach from https://github.com/taspinar/twitterscraper/pull/302.

Please note that this will be much more expensive in terms of memory, and about 1/2 as fast. It uses a browser instance and applies all the javascript of twitter in browser.

Also, fwiw I can get many tweets on that branch, I'd try without -j on that branch first:

INFO: Got 878 tweets for realDonaldTrump%20since%3A2018-04-26%20until%3A2019-01-11.
INFO: Got 14301 tweets (878 new).
krismuhi commented 4 years ago

Thanks: this fixed it for me: HEADER = {'User-Agent': random.choice(HEADERS_LIST),'X-Requested-With': 'XMLHttpRequest'}

INFO: Got 372 tweets (0 new).

OmarxData commented 4 years ago

image

Ive already edited query.py, not sure what im doing wrong, can anyone help me with this. still get 0 tweets

lapp0 commented 4 years ago

@OmarxData are you using origin/master? pypi doesn't have latest.

Frickson commented 4 years ago

AttributeError: 'NoneType' object has no attribute 'user' I get this error because Twitterscraper cannot scrape private account??

zhicheng0501 commented 4 years ago

@lapp0

Hi, this is what i am facing now. You told me to try selenium method but it will change the structure of data pulled which would be a trouble because i have pulled millions of data already. I need to keep the same structure of data. That is why i am trying to make this method happen.

image

lapp0 commented 4 years ago

@zhicheng0501 Just to be clear, you have tried origin/master first, correct? I'm getting tweets for origin/master's version without having to use selenium or #302.

If you have indeed tried origin/master, you can try #302 and transform the results using the twitterscraper provided datastructure Tweet.

Please let me know if origin/master (not the latest version from pip!) is failing for you.

zhicheng0501 commented 4 years ago

@zhicheng0501 Just to be clear, you have tried origin/master first, correct? I'm getting tweets for origin/master's version without having to use selenium or #302.

If you have indeed tried origin/master, you can try #302 and transform the results using the twitterscraper provided datastructure Tweet.

Please let me know if origin/master (not the latest version from pip!) is failing for you.

May i ask how to use origin/master? I have been using pip to install and run the program. it seems that pip installed the version which doesnot work well.

Then if it is the right way(or is it the origin/master way you mentioned) to do so as follows:

i use git clone https://github.com/taspinar/twitterscraper.git first and go to twitterscraper fileclip to run python3 setup.py install. Then i try to run twitterscraper but still failed and got error as follows:

image

jassena commented 4 years ago

guys pls help image image

jassena commented 4 years ago

im using spyder 4.1.3

lapp0 commented 4 years ago

@jassena what is the command you're running and full output? (please text, no screenshot)

lapp0 commented 4 years ago

@zhicheng0501 perhaps you have a conflicting dependency version? Try

python3 -m venv .venv
source .venv/bin/activate
python3 setup.py install
zhicheng0501 commented 4 years ago

@zhicheng0501 perhaps you have a conflicting dependency version? Try

python3 -m venv .venv
source .venv/bin/activate
python3 setup.py install

This is what it shows. It seems that i tried but still failed. May i ask how to use selenium method? I followed 302 steps and install selenium, geckodriver and firefox. What should i do next? Would you please show me a screen of sample code you use to run selenium in this case? 屏幕快照 2020-06-24 10 24 47

zhicheng0501 commented 4 years ago

@zhicheng0501 perhaps you have a conflicting dependency version? Try


python3 -m venv .venv
source .venv/bin/activate
python3 setup.py install

Thanks man That really works. I know what problem i came across these days. Twitter blocked the keyword nCoV. That's why i suffered from the retrying feedback. At least on my end, it is what i saw. I do not know why but it is what occurred and confused me these days. I am writing to Twitter why they block it. Once i got a clue, I will let the others know which will help to improve your project.

MOre than that above, would you please tell how to run javascript method in selenium. i created a new environment using the method you mentioned above and tried the following code but failed: twitterscraper trump --javascript -bd 2020-04-01 -ed 2020-04-02 -o trump.json

lapp0 commented 4 years ago

@Frickson please share your command and output

jassena commented 4 years ago

@jassena what is the command you're running and full output? (please text, no screenshot)

hey i just run the code that u mention above...actually i also got 0 tweets...then i change the query.py header that u just told...at that time i got this error image this is my code

lapp0 commented 4 years ago

@jassena What is "this error"? Please share your code and the full error here http://gist.github.com/

Toby-masuku commented 4 years ago

Tried everything, still getting 0

lapp0 commented 4 years ago

@Toby-masuku are you using origin/master? The latest version on pypi doesn't work.

Toby-masuku commented 4 years ago

@lapp0 yes I'm using the origin master

zhicheng0501 commented 4 years ago

Tried everything, still getting 0

What is the keyword of your query? I am searching Trump and it works fine. But it fails as i search nCoV and Wuhancoronavirus.

Toby-masuku commented 4 years ago

@zhicheng0501 key word climate change

lapp0 commented 4 years ago

what is your exact command and what is your output? Please paste it.

zhicheng0501 commented 4 years ago

@Toby-masuku I used pip install twitterscraper and run this code searching "climate change". It works just fine. Just let you know. It is okay on my end.

Last login: Fri Feb 28 18:20:18 on ttys000 bogon:~ zhaoningning$ twitterscraper "climate change" --lang de --limit 10000000000000000 -bd 2020-04-27 -ed 2020-04-28 -o wuhan04270428.json INFO: {'User-Agent': 'Mozilla/5.0 (Windows NT 5.2; RW; rv:7.0a1) Gecko/20091211 SeaMonkey/9.23a1pre', 'X-Requested-With': 'XMLHttpRequest'} INFO: queries: ['climate change since:2020-04-27 until:2020-04-28'] INFO: Querying climate change since:2020-04-27 until:2020-04-28 INFO: Scraping tweets from https://twitter.com/search?f=tweets&vertical=default&q=climate%20change%20since%3A2020-04-27%20until%3A2020-04-28&l=de INFO: Using proxy 103.102.15.90:10714 INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-1254687772931129344-1254876447442964480&q=climate%20change%20since%3A2020-04-27%20until%3A2020-04-28&l=de INFO: Using proxy 113.11.156.42:31935 INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLv9ydfF6SIWgMC87d7em-oiEjUAFQAlAFUAFQAA&q=climate%20change%20since%3A2020-04-27%20until%3A2020-04-28&l=de INFO: Using proxy 118.174.196.112:36314 INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaCwL3t67Kf6SIWgMC87d7em-oiEjUAFQAlAFUAFQAA&q=climate%20change%20since%3A2020-04-27%20until%3A2020-04-28&l=de INFO: Using proxy 41.63.170.142:8080 INFO: Twitter returned : 'has_more_items' INFO: Got 38 tweets for climate%20change%20since%3A2020-04-27%20until%3A2020-04-28. INFO: Got 38 tweets (38 new).

chanhee-kang commented 4 years ago

@zhicheng0501 HI, tired with "covid19" as searching query but i failed.. do you know the reason?

zhicheng0501 commented 4 years ago

@zhicheng0501 HI, tired with "covid19" as searching query but i failed.. do you know the reason?

@chanhee-kang it seems that covid19 searching works fine on my end at this moment. Could you please try ncov and search date ranging from 5-31 to 6-1 and tell me if it works on your end? it fails here on my end. When i searched ncov, it worked well but it failed since i parsed a lot of data. I guess it might trigger some mechanism of twitter.

This is the command and result that i am doing of covid19: bogon:ncov zhaoningning$ twitterscraper COVID-19 --lang de --limit 100000000 -bd 2020-05-31 -ed 2020-06-01 -o wuhan05310601.json INFO: {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201', 'X-Requested-With': 'XMLHttpRequest'} INFO: queries: ['COVID-19 since:2020-05-31 until:2020-06-01'] INFO: Querying COVID-19 since:2020-05-31 until:2020-06-01 INFO: Scraping tweets from https://twitter.com/search?f=tweets&vertical=default&q=COVID-19%20since%3A2020-05-31%20until%3A2020-06-01&l=de INFO: Using proxy 118.175.93.148:55169 INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-1267229970235043842-1267243855260205056&q=COVID-19%20since%3A2020-05-31%20until%3A2020-06-01&l=de INFO: Using proxy 201.55.160.133:3128 INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaEwLWJtLyNliMWgICmyc_kk5YjEjUAFQAlAFUAFQAA&q=COVID-19%20since%3A2020-05-31%20until%3A2020-06-01&l=de

Toby-masuku commented 4 years ago

@Toby-masuku I used pip install twitterscraper and run this code searching "climate change". It works just fine. Just let you know. It is okay on my end.

Last login: Fri Feb 28 18:20:18 on ttys000 bogon:~ zhaoningning$ twitterscraper "climate change" --lang de --limit 10000000000000000 -bd 2020-04-27 -ed 2020-04-28 -o wuhan04270428.json INFO: {'User-Agent': 'Mozilla/5.0 (Windows NT 5.2; RW; rv:7.0a1) Gecko/20091211 SeaMonkey/9.23a1pre', 'X-Requested-With': 'XMLHttpRequest'} INFO: queries: ['climate change since:2020-04-27 until:2020-04-28'] INFO: Querying climate change since:2020-04-27 until:2020-04-28 INFO: Scraping tweets from https://twitter.com/search?f=tweets&vertical=default&q=climate%20change%20since%3A2020-04-27%20until%3A2020-04-28&l=de INFO: Using proxy 103.102.15.90:10714 INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-1254687772931129344-1254876447442964480&q=climate%20change%20since%3A2020-04-27%20until%3A2020-04-28&l=de INFO: Using proxy 113.11.156.42:31935 INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLv9ydfF6SIWgMC87d7em-oiEjUAFQAlAFUAFQAA&q=climate%20change%20since%3A2020-04-27%20until%3A2020-04-28&l=de INFO: Using proxy 118.174.196.112:36314 INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaCwL3t67Kf6SIWgMC87d7em-oiEjUAFQAlAFUAFQAA&q=climate%20change%20since%3A2020-04-27%20until%3A2020-04-28&l=de INFO: Using proxy 41.63.170.142:8080 INFO: Twitter returned : 'has_more_items' INFO: Got 38 tweets for climate%20change%20since%3A2020-04-27%20until%3A2020-04-28. INFO: Got 38 tweets (38 new).

can you please share your code, Maybe I made a mistake

lapp0 commented 4 years ago

I used pip install twitterscraper and run this code searching "climate change".

@zhicheng0501 the pip version worked for you? It doesn't have the headers fix in it. How did you get it to work?

lapp0 commented 4 years ago

@Toby-masuku looks like you got 38 results. What is the problem?

Toby-masuku commented 4 years ago

@lapp0 thats's not me, I got 0

Screenshot (67)

Frickson commented 4 years ago

@Frickson please share your command and output

Hi lapp0, I run the get_twitter_user_data.py from master/origin and just changed the list of name.

Here my code

start = time.time()
    users = ['Ms_MeiChing']

    pool = Pool(8)    
    for user in pool.map(get_user_info,users):
        twitter_user_info.append(user)
Traceback (most recent call last):
  File "C:\Users\Asus\AppData\Local\Programs\Python\Python37\lib\site-packages\twitterscraper-1.4.0-py3.7.egg\twitterscraper\query.py", line 323, in query_user_info       
    user_info = query_user_page(INIT_URL_USER.format(u=user))
  File "C:\Users\Asus\AppData\Local\Programs\Python\Python37\lib\site-packages\twitterscraper-1.4.0-py3.7.egg\twitterscraper\query.py", line 292, in query_user_page       
    user_info = User.from_html(html)
  File "C:\Users\Asus\AppData\Local\Programs\Python\Python37\lib\site-packages\twitterscraper-1.4.0-py3.7.egg\twitterscraper\user.py", line 101, in from_html
    return self.from_soup(user_profile_header, user_profile_canopy)
  File "C:\Users\Asus\AppData\Local\Programs\Python\Python37\lib\site-packages\twitterscraper-1.4.0-py3.7.egg\twitterscraper\user.py", line 57, in from_soup
    tweets = tag_prof_nav.find('span', {'class':"ProfileNav-value"})['data-count']
  File "C:\Users\Asus\AppData\Local\Programs\Python\Python37\lib\site-packages\bs4\element.py", line 1321, in __getitem__
    return self.attrs[key]
KeyError: 'data-count'
INFO: Got user information from username Ms_MeiChing
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\Asus\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\Asus\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 44, in mapstar
    return list(map(*args))
  File "c:\Users\Asus\Desktop\twitterscraper\examples\get_twitter_user_data.py", line 20, in get_user_info
    twitter_user_data["user"] = user_info.user
AttributeError: 'NoneType' object has no attribute 'user'
"""
user_data.py", line 53, in <module>
    main()
  File "c:/Users/Asus/Desktop/twitterscraper/examples/get_twitter_user_data.py", line 40, in main
    for user in pool.map(get_user_info,users):
  File "C:\Users\Asus\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Users\Asus\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 657, in get
    raise self._value
AttributeError: 'NoneType' object has no attribute 'user'
erb13020 commented 4 years ago

@javad94

What exactly does this header list do?

HEADERS_LIST = [ 'Mozilla/5.0 (Windows; U; Windows NT 6.1; x64; fr; rv:1.9.2.13) Gecko/20101203 Firebird/3.6.13', 'Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko', 'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201', 'Opera/9.80 (X11; Linux i686; Ubuntu/14.10) Presto/2.12.388 Version/12.16', 'Mozilla/5.0 (Windows NT 5.2; RW; rv:7.0a1) Gecko/20091211 SeaMonkey/9.23a1pre' ]

zhicheng0501 commented 4 years ago

@javad94

What exactly does this header list do?

HEADERS_LIST = [ 'Mozilla/5.0 (Windows; U; Windows NT 6.1; x64; fr; rv:1.9.2.13) Gecko/20101203 Firebird/3.6.13', 'Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko', 'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201', 'Opera/9.80 (X11; Linux i686; Ubuntu/14.10) Presto/2.12.388 Version/12.16', 'Mozilla/5.0 (Windows NT 5.2; RW; rv:7.0a1) Gecko/20091211 SeaMonkey/9.23a1pre' ]

Creat random header list for sending request to the server. It lets the server see requests coming from different client,avoiding being blocked by the server.

MaximAbdulatif commented 4 years ago

Just did that, same error 😑

Seems Twitter has restricted the connection so that all requests return a page with "We've detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter?"

Indeed, this can be fixed by modifying the header dictionary in query.py from HEADER = {'User-Agent': random.choice(HEADERS_LIST)} to HEADER = {'User-Agent': random.choice(HEADERS_LIST), 'X-Requested-With': 'XMLHttpRequest'} that should fix the issue.