Closed rcliao closed 10 years ago
Up until now, our biggest struggle has been with how to deal with tags and we have yet to find a decent way to handle them. Instead, what if we just completely disregarded tags and try to come up with something based on what can be more easily quantified mathematically: geolocation, date_taken, date_posted, and views. A count of the photos can be an additional statistic for 5 fields which should be plenty from the photo perspective. (The locale location can be substituted for geolocation if we need for clustering purposes.) In short, we have plenty of information as it pertains to the photos.
I believe that we are having trouble finding things to analyze because we can't find a way to use the tags. I think we finding outside data would help in our analysis. For example, if we throw in data on the which locations rate highly as a good place to live then we can start to make predictions based on score vs tags/locations/quantity of photos.
What do you guys think? Can we make it with just what we have or would getting additional information help?
Another idea I thought about was to look at how a large difference between date_taken vs date_posted would affect views assuming identical tags. We can remove the tags that are insignificant (less than a certain amount that exist) and take an average across all tags with the same delta. Then graph out how drastic or insignificant waiting to upload a picture is to uploading.
This could potentially give us nothing to predict though... the graph might be completely boring.
I like this idea on how long people wait to upload but let's not get too greedy. Forget tags for this date_taken, date_posted analysis.
As to the other comment. Tags are a part of the photo data. It's called metadata and it has intrinsic meaning, unlike the other data for photos wherein meaning must be derived. That said, I don't mind reducing our reliance on tags as a data point. However, why can't we still use the count of top tags and a top 10-20 tags by total view count as a couple cool graphs?
As to if we have enough. I'm certain it's going to be fine. But my idea of making a map where you can type a word in a search box and if it matches a tag in the data set it will flash on the map where pictures were taken with that tag, the bigger the flash the more views, and have it with a slider at the bottom that passes through time over the year.
Jonathan Kroening | jonathankroening.com
On Mar 7, 2014, at 7:04 PM, Daniel Young notifications@github.com wrote:
Another idea I thought about was to look at how a large difference between date_taken vs date_posted would affect views assuming identical tags. We can remove the tags that are insignificant (less than a certain amount that exist) and take an average across all tags with the same delta. Then graph out how drastic or insignificant waiting to upload a picture is to uploading.
This could potentially give us nothing to predict though... the graph might be completely boring.
— Reply to this email directly or view it on GitHub.
Doesn't seem like we have time to do crazy analyzation at this moment, I think we will just focus on the visualization. Therefore, I will close this issue and just pretend there is nothing happen here. :+1:
Lets use this issue to document what we have for the analyze part. @jkroening @surhorse
If you have any thought, please comment at this issue so that we can all document what we have in one place.
The data we have so far contains the following attributes