slavaspirin / twitter_sentiment

Twitter demographics and sentiment prediction
MIT License
1 stars 0 forks source link

Location Analysis - figure out one's residential postal code #1

Open winstonll opened 4 years ago

winstonll commented 4 years ago

Look at the tweets by a person and figure out the most frequent location using some sort of density estimation.

slavaspirin commented 4 years ago

I compared both APIs at the end of today (REST and streamer). Only 1-4% have geolocations. Also even when I look up every user from the resulting search, go to his/her page and scroll through the feed, most likely the geolocation will always be on or off.

Taking all accounts into consideration, about 70% have city-level coordinates, ~3% have polygon/point coordinates and the rest have nothing at all.

Another way of getting the geodata would be going through one's followers and extracting all geolocations if present hoping that somebody from this person's followers list has exact geolocation posted. This approach is random and I am sure I'll face rate limit shortly after the start.

Still working on it.

winstonll commented 4 years ago

OK let me clarify. So you are saying that open 1-4% of the tweet have a location right? Does that depend on the person or purely random? For example, consider the following scenarios: 1) A person either turns location on or off, and so if they turn it on then every tweet by this person will have a location, or 2) Location is randomly shown and in one's feed there is a 1-4% chance that a location exists?

The former means that we can only get locations for certain people (the one's with location turned on) while the latter means we can get locations for everyone so long as we query enough historical tweets from them.

slavaspirin commented 4 years ago

So yes, if a user has user_geoloation enabled he/she is likely to share it again, but there is no certainty when. Also, my sample sizes were about 50-100 users so these numbers can vary.

Do you think we can cluster tweets to the ones that have exact geolocation?

slavaspirin commented 4 years ago

Stats for the 120k tweets I have collected so far:

geo_enabled: 40.68% of the tweets

city_level: 68.91% box_level: 2.43% point_level: 0.16%