shaikhsajid1111 / twitter-scraper-selenium

Python's package to scrap Twitter's front-end easily
https://pypi.org/project/twitter-scraper-selenium
MIT License
308 stars 50 forks source link

Incomprehendable Error #46

Closed Dragonizedpizza closed 1 year ago

Dragonizedpizza commented 1 year ago
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/keyword.py", line 128, in scrap
    self.start_driver()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/keyword.py", line 52, in start_driver
    self.browser, self.headless, self.proxy, self.browser_profile).init()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/driver_initialization.py", line 104, in init
    driver = self.set_driver_for_browser(self.browser_name)
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/driver_initialization.py", line 97, in set_driver_for_browser
    return webdriver.Firefox(service=FirefoxService(executable_path=GeckoDriverManager().install()), options=self.set_properties(browser_option))
  File "/usr/local/lib/python3.10/dist-packages/seleniumwire/webdriver.py", line 178, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/firefox/webdriver.py", line 177, in __init__
    super().__init__(
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 277, in __init__
    self.start_session(capabilities, browser_profile)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 370, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Failed to decode response from marionette

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/Bots/DiscordBots/TwitterTopics/MTC/src/util/scrape.py", line 6, in <module>
    scrape_topic(filename="tweets", url='https://twitter.com/i/topics/1468157909318045697',browser="firefox", tweets_count=25)
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/topic.py", line 53, in scrape_topic
    data = keyword_bot.scrap()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/keyword.py", line 140, in scrap
    self.close_driver()
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/keyword.py", line 55, in close_driver
    self.driver.close()
AttributeError: 'str' object has no attribute 'close'
shaikhsajid1111 commented 1 year ago

I don't think this issue is related to the library but is probably due to some external factor. I am unable to recreate this issue. What is your environment? I found related issues here

Just side note: I would suggest using scrape_topic_with_api() for faster speed and accurate data in case you're scraping topics

Dragonizedpizza commented 1 year ago

I tried scrape_topic_with_api, I got this -

2022-10-29 09:23:32,697 - twitter_scraper_selenium.element_finder - ERROR - Failed to find key!
NoneType: None
2022-10-29 09:23:32,794 - twitter_scraper_selenium.element_finder - WARNING - Error at find_graphql_link : 'NoneType' object has no attribute 'split'
2022-10-29 09:27:50,535 - WARNING - Failed to make request!
Traceback (most recent call last):
  File "/root/Bots/DiscordBots/TwitterTopics/MTC/src/util/scrape.py", line 6, in <module>
    scrape_topic_with_api(URL='https://twitter.com/i/topics/1468157909318045697', output_filename='tweets', tweets_count=10)
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/topic_api.py", line 156, in scrape_topic_with_api
    data.update(content)
TypeError: 'NoneType' object is not iterable
shaikhsajid1111 commented 1 year ago

The issue is probably that firefox itself is not instantiating on your system. What is your environment?

Dragonizedpizza commented 1 year ago

The issue is probably that firefox itself is not instantiating on your system. What is your environment?

As in? I'm on Ubuntu 20.04, with Python 3.10

shaikhsajid1111 commented 1 year ago

Okay, I don't have much idea about this issue. People all over the internet end up with this for different reasons which can be found here. I've kept a wait before accessing DOM as well. Please consider passing the argument headless=False to scrape_topic(), it'll show you what is happening with your browser

Dragonizedpizza commented 1 year ago
2022-10-29 12:47:24,770 - WARNING - Error at scrap : Message: Process unexpectedly closed with status 1

Traceback (most recent call last):
  File "/root/Bots/DiscordBots/TwitterTopics/MTC/src/util/scrape.py", line 6, in <module>
    scrape_topic_with_api(URL='https://twitter.com/i/topics/1468157909318045697', output_filename='tweets', tweets_count=10, headless=False)
  File "/usr/local/lib/python3.10/dist-packages/twitter_scraper_selenium/topic_api.py", line 156, in scrape_topic_with_api
    data.update(content)
AttributeError: 'NoneType' object has no attribute 'update'
shaikhsajid1111 commented 1 year ago

No, I meant to say you can watch the browser opening by yourself which can let you know what is happening in your system while scraping. You might have seen browser opening automatically all that, that's what I was talking about

Dragonizedpizza commented 1 year ago

No, I meant to say you can watch the browser opening by yourself which can let you know what is happening in your system while scraping. You might have seen browser opening automatically all that, that's what I was talking about

I tested it, it works on my laptop for some reason, but not the vps

shaikhsajid1111 commented 1 year ago

I can't say why it isn't running on VPS 🤔

Dragonizedpizza commented 1 year ago

nvm, turns out a reboot fixed it :joy: thank you for the help