shaikhsajid1111 / twitter-scraper-selenium

Python's package to scrap Twitter's front-end easily
https://pypi.org/project/twitter-scraper-selenium
MIT License
305 stars 47 forks source link

example for twitter topic #29

Closed rachmadaniHaryono closed 1 year ago

rachmadaniHaryono commented 1 year ago

i just tried this program to get twitter topic

here is the final result

from twitter_scraper_selenium.keyword import Keyword
URL = 'https://twitter.com/i/topics/1415728297065861123'
headless = False
keyword = 'steamdeck'
browser = 'firefox'
keyword_bot = Keyword(keyword, browser=browser, url=URL, headless=headless, proxy=None, tweets_count=1000)
data = keyword_bot.scrap()
with open('steamdeck.json', 'w') as f:
    json.dump(json.loads(data), f, indent=2)

#  print result
import textwrap
width = 120
for item in sorted(list(json.loads(data).values()), key=lambda x: x['posted_time']):
    wrap_text = '\n'.join(textwrap.wrap(item['content'], width=width))
    print(f"{item['posted_time']} {item['tweet_url']}\n{wrap_text}")
    print('-'*width)

some note on this

example 1

try:
  # assume error on this line because import webdriver failed
  from inspect import currentframe
except Exception as ex:
  print(ex)

 # error happened again because currentframe is not imported
 frameinfo = currentframe()
shaikhsajid1111 commented 1 year ago

Thanks for the review @rachmadaniHaryono.

this is little bit confusing because all import error is catch with general exception see also example 1 below if possible just let error happened and end the program

Yeah, moving import outside of try/catch will help catch the bug.


save json will replace old data, so be careful

The behaviour to write data in write mode is definitely misleading, this issue created misunderstanding for the user of my other library. Here's the issue. That's the feature I was even thinking of implementing to check if the file already exists and switch writing mode.


selenium can use custom profile folder, currently i have to edit on either set_properties or set_driver_for_browser on driver_initialization.Initializer

Yeah, Selenium can use a custom profile but I don't think It is going to help much in scraping as long as you're scraping unauthenticated way. Is it going to help in any way?


any reason why Keyword.scrap have to return json string? why not just return it as dict? when saving the data as csv, it have to be decoded back to dict

The output will be invalid JSON.

rachmadaniHaryono commented 1 year ago

Yeah, Selenium can use a custom profile but I don't think It is going to help much in scraping as long as you're scraping unauthenticated way. Is it going to help in any way?

i'm thinking of scraping my twitter frontpage with this and use some firefox extension while scraping the data


maybe i will create pr to scrap twitter topic later

shaikhsajid1111 commented 1 year ago

Oh, Okay. Thanks