haystack / tipsy

A new project to encourage pay-what-you-want support for any web site.
http://tipsy.csail.mit.edu/
MIT License
31 stars 9 forks source link

Debouncing visits into the log #55

Open tbranyen opened 9 years ago

tbranyen commented 9 years ago

Currently any logging that results in an activity (start/stop triggers) adds a new entry into the visit log. This works great for most use cases, especially since it's nice to know you were briefly viewing documentation and paging back and forth. It lines up with usage.

... however this breaks down when sites abuse the ability to autoreload for refreshing instead of opting for a more asynchronous approach via sockets or transparent polling. What happens in this case is that a page will constantly refresh while it's open to sync new data in and will unintentionally flood the visit log with entries.

We should enable a debounce threshold to eliminate these occurrences. One issue is knowing how long of a delay until inserting? If the refresh triggers every few minutes, but you keep the page open all day, we'll need to debounce at the minute level. However, if you visit a page, wait a few minutes and then go back to that tab it will not log as a separate instance, even though minutes lasted in between.

@schilippe do you have any feedback on how you would prefer to see this implemented?

schilippe commented 9 years ago

@tbranyen I will look into it. It is not only a problem with reload, even if someone is splitting their time between two tabs, each time a tab is switched, it logs a new visit and places a new row.

Maybe we should revisit the definition of a "visit". I am not sure how useful of a metric the visits are the way they are defined now. What if we call a "visit" the fact that a page was opened and stop the visit when the page is closed (ie url is changed or tab is closed) thus making a "visit" a decision the user made to visit the site. I frequently multitask between tabs but I don't know if I would call each tab switch a unique visit. This way, we don't have too many rows in each table, and each row would really correspond to a unique choice by the user to visit a site.

In terms of implementation, I am trying to think how much this will break our existing setup...

karger commented 9 years ago

I agree that visit count is a relatively meaningless measure. Perhaps we shouldn't be measuring it at all, but instead be looking at, e.g. "days with a visit". Perhaps api's would also let us detect "opening" a site by (i) entering an address in the address bar or (ii) clicking a bookmark or (iii) clicking on a link; we could count those.

On 1/28/2015 4:50 PM, schilippe wrote:

@tbranyen https://github.com/tbranyen I will look into it. It is not only a problem with reload, even if someone is splitting their time between two tabs, each time a tab is switched, it logs a new visit and places a new row.

Maybe we should revisit the definition of a "visit". I am not sure how useful of a metric the visits are the way they are defined now. What if we call a "visit" the fact that a page was opened and stop the visit when the page is closed (ie url is changed or tab is closed) thus making a "visit" a decision the user made to visit the site. I frequently multitask between tabs but I don't know if I would call each tab switch a unique visit. This way, we don't have too many rows in each table, and each row would really correspond to a unique choice by the user to visit a site.

In terms of implementation, I am trying to think how much this will break our existing setup...

— Reply to this email directly or view it on GitHub https://github.com/haystack/tipsy/issues/55#issuecomment-71923760.

schilippe commented 9 years ago

I will test out a version with a "days with a visit" column in the log table and see what people think about it. I think we once discussed how we don't want tipsy to become a browsing activity tracking tool so maybe the less extraneous information we give the better. But, I actually like seeing the info, even though it's not what tipsy is all about.

schilippe commented 9 years ago

I pushed up a branch last night with a "Days with visit" column, but I don't want to merge it with master.

schilippe commented 9 years ago

Alright, I merged PR #56 the additional column branch with a new branch I created off master called master-daysVisited.

schilippe commented 9 years ago

So the "days visited" column seems to be all worked out now that many of the bugs (the Chrome API is unfortunately far from robust) have been fixed. I've been pushing and merging the changes on the separate branch.

But before this issue is closed we still might want to decide exactly how many, or which, entries to show in the table when the user clicks on the individual entry. Now it is still showing all the visits. We should probably reduce that but to what? I think the "visit" idea of when the tabbed is switched to might be more useful than just when the page was loaded and unloaded since users often leave tabs open for a long time, sometimes even forgetting they are there. A visit means they actually gave it some attention. But there are just too many visits, even without the automatically refreshing pages. Maybe bucketing them within a certain parameter would be the best, but what should this parameter be? I think a day would be good and just to show the last visit of the day. Any objections and/or other ideas?