shaikhsajid1111 / twitter-scraper-selenium

Python's package to scrap Twitter's front-end easily
https://pypi.org/project/twitter-scraper-selenium
MIT License
323 stars 50 forks source link

More Examples and Documentation Needed #64

Open richlysakowski opened 1 year ago

richlysakowski commented 1 year ago

More Examples and Documentation are needed...

I am new to using this package. I see a lot of functionality, but not enough examples of how to use it. It would be nice if there was a ReadTheDocs.io website for it.

I am interested in pulling images and metadata associated with tweets. The element_finder.py file appears to pull images, but it unclear how to call it separately, or whether that is the best way to use it in this package.

@staticmethod
def find_images(tweet) -> Union[list, None]:
    """finds all images of the tweet"""
    try:
        image_element = tweet.find_elements(By.CSS_SELECTOR,
                                            'div[data-testid="tweetPhoto"]')
        images = []
        for image_div in image_element:
            href = image_div.find_element(By.TAG_NAME,
                                          "img").get_attribute("src")
            images.append(href)
        return images
    except Exception as ex:
        logger.exception("Error at method find_images : {}".format(ex))
        return []

Many of the functions do not have docstrings. I am going through them now to try to add descriptions for my own benefit.

I also want to add things to outputs to automate the data capture. For example, I want to time-stamp output files, because during testing I will have to do multiple runs to make sure that I got all information I was looking for.

For JSON files, I would like to pretty-print them before saving them, because they come out in a one-liner format in the file.

I am not sure where to insert these functions. I will keep browsing the package structure to see if I can figure out these things, and add helpful docstrings where they are missing.

shaikhsajid1111 commented 1 year ago

Yes, I agree that this needs better documentation. Unfortunately, I don't get enough time to do so. Those suggestions are great, will implement them eventually in next few releases. Thanks!

Yes, you can directly call it outside as long as the tweet is the result of the driver.find_element()