superryeti / Hands-on-WebScraping

This repo is a part of blog series on several web scraping projects where we will explore scraping techniques to crawl data from simple websites to websites using advanced protection.
MIT License
82 stars 74 forks source link

Pulling All Tweets #3

Open kozakalec opened 4 years ago

kozakalec commented 4 years ago

Hey, quick question. When I ran this using that hashtag, BigData, it pulled all tweets containing the words data or big data. Why is it not only pulling tweets with the hashtag BigData?

superryeti commented 4 years ago

please pull again and try, it should now pull the hashtag.

Earlier the crawler was searching for the keyword when it should have been searching for hashtag.

kozakalec commented 4 years ago

Interestingly, my results are similar. Results are saying that there are 0 hashtags reported when in actuality there are multiple hashtags associated to that tweet.

kozakalec commented 4 years ago

Additionally, the "tweet_text" column is not showing the full text of the tweet. Often, only partials are being shown.

superryeti commented 4 years ago

thank you Much appreciated. please tag if you find other bugs.

I will update both tomorrow. For now. I will revert the latest commit.

kozakalec commented 4 years ago

Thanks very much. Feel free to comment back/ping me if you believe the bugs are fixed. Thanks again.