taspinar / twitterscraper

Scrape Twitter for Tweets
MIT License
2.39k stars 581 forks source link

twitter scrapper error #367

Open mahajnay opened 2 years ago

mahajnay commented 2 years ago

Hi all,

While using twitter scrapper,

I have this code

from twitterscraper import query_tweets import datetime as dt import pandas as pd

begin_date = dt.date(2020,3,1) end_date = dt.date(2021,11,1)

limit = 100 lang = 'english'

tweets = query_tweets('vaccinesideeffects', begindate = begin_date, enddate = end_date, limit = limit, lang = lang) df = pd.DataFrame(t.dict for t in tweets)

df = df['text']

df

Getting below error


AttributeError Traceback (most recent call last)

in ----> 1 from twitterscraper import query_tweets 2 import datetime as dt 3 import pandas as pd 4 5 begin_date = dt.date(2020,3,1) ~/opt/anaconda3/lib/python3.8/site-packages/twitterscraper/__init__.py in 11 12 ---> 13 from twitterscraper.query import query_tweets 14 from twitterscraper.query import query_tweets_from_user 15 from twitterscraper.query import query_user_info ~/opt/anaconda3/lib/python3.8/site-packages/twitterscraper/query.py in 74 yield start + h * i 75 ---> 76 proxies = get_proxies() 77 proxy_pool = cycle(proxies) 78 ~/opt/anaconda3/lib/python3.8/site-packages/twitterscraper/query.py in get_proxies() 47 soup = BeautifulSoup(response.text, 'lxml') 48 table = soup.find('table',id='proxylisttable') ---> 49 list_tr = table.find_all('tr') 50 list_td = [elem.find_all('td') for elem in list_tr] 51 list_td = list(filter(None, list_td)) AttributeError: 'NoneType' object has no attribute 'find_all'
Suizer commented 2 years ago

Same for me

expl0r3rgu1 commented 2 years ago

Same issue here

KamilsobC commented 2 years ago

It tries to grab table from https://free-proxy-list.net with id ='proxylisttable' but it doesnt exist. You need to remove it from line 48 : table = soup.find('table',id='proxylisttable') to table = soup.find('table')

barniker commented 2 years ago

fixed this error using Pandas:

    import pandas as pd 
    ...
    def get_proxies():    
    resp = requests.get(PROXY_URL)
    df = pd.read_html(resp.text)[0]
    list_ip=list(df['IP Address'].values)
    list_ports=list(df['Port'].values.astype(str))
    list_proxies = [':'.join(elem) for elem in list(zip(list_ip, list_ports))]

however, this still does not work.

list_of_tweets = query_tweets("Trump OR Clinton", 10) returns:

Exception: Traceback (most recent call last):
  File "/Users/rmartin/Desktop/Envs/crypto_env/lib/python3.9/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
    raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 11 (SIGSEGV) Job: 0.
AdrienMau commented 2 years ago

Same error here on python 3.9

NafiGit commented 2 years ago

It tries to grab table from https://free-proxy-list.net with id ='proxylisttable' but it doesnt exist. You need to remove it from line 48 : table = soup.find('table',id='proxylisttable') to table = soup.find('table')

thanks, it solved my problem

vedanta28 commented 1 year ago

@NafiGit How did you edit the code in their repository?