jakobzhao / geog458

Advanced Digital Geographies @ UW-Seattle
GNU Lesser General Public License v3.0
251 stars 58 forks source link

Lab 2 - Tweepy stream.filter Returning Unrelated Tweets Despite Track Parameter #26

Closed Reina-Orikasa closed 2 years ago

Reina-Orikasa commented 2 years ago

Here is the code block I used:

stream_listener = StreamListener(time_limit=200, file=output_file) stream = tweepy.Stream(auth=myauth, listener=stream_listener) stream.filter(locations=LOCATIONS, languages=['en'], encoding="utf-8", track=['Russia'])

When searching the results, only 2 of the 623 tweets contain the word 'Russia'. The rest are irrelevant to Russia. The results are similar if I use track=['Ukraine'].

I noticed similar issues here: https://github.com/jakobzhao/geog458/issues/13 and here: https://github.com/jakobzhao/geog458/issues/10 but no followup in either

Reina-Orikasa commented 2 years ago

Twitter's documentation states that:

Bounding boxes do not act as filters for other filter parameters. For example track=twitter&locations=-122.75,36.8,-121.75,37.8 would match any Tweets containing the term Twitter (even non-geo Tweets) OR coming from the San Francisco area.

So you will get results either containing the words in the track parameter OR within the United States/your coordinates if I am understanding it correctly. Leading to the mass influx of irrelevant tweets to the track parameter.

If I remove the locations parameter, it starts to return tweets containing the track=[''] keyword ONLY. However, this creates the problem that tweets are being pulled globally instead of the United States only.