learningequality / ka-lite

KA Lite: lightweight web server for serving core Khan Academy content (videos and exercises) without needing internet connectivity
https://learningequality.org/ka-lite/
Other
458 stars 305 forks source link

Record all usage data, even if no user is logged in / facility is created. #706

Closed bcipolli closed 8 years ago

bcipolli commented 11 years ago

In order to understand KA Lite usage worldwide, we need as much usage data as possible. However, when no users exist or are created, we simply do not save any usage data.

We can record all *Log data, as usual, when no user is logged in, or when Django users are logged in.

Possible designs:

bcipolli commented 11 years ago

In reviewing our data collection and the stories submitted to us, I'm convinced that there is in fact a lot of video usage happening while not logged in, and that we're missing a significant amount of data that is syncable, by simply not recording it.

@jamalex any interest in designing and/or implementing this? To me, this is one of the highest priority/cost dev items on our plate--high priority, I think relatively low cost to get a good design and implement it.

bcipolli commented 11 years ago

Assigning to myself so I don't lose track, while I await @jamalex response.

bcipolli commented 10 years ago

Absolutely LOVE the data collection we've started on video downloads and language pack downloads. LOVE that we have the registration requirement to use online features.

This one is next, and I think there's a relatively easy way to do it:

Consequences of this design:

Will think more about whether another design might be preferable (like an AnonymousUser extension of FacilityUser, which is one-way syncable... but then the ExerciseLogs would need to be one-way synable...)

bcipolli commented 10 years ago

or

That might actually work out better, though it would take some work on the reporting side to show admins the amount of anonymous data that's being collected.

bcipolli commented 10 years ago

Hmm, problem with this is, we use the user to determine the zone with which the *Log object belongs. So these would globally collide.

I have the anonymous = None version implemented on my machine, but will think through further.

jamalex commented 10 years ago

Love the idea. In some ways, we even want every anonymous "session" to create a new instance of the anonymous facility user. Only downside of that is it could be messy on coach reports (but agree it's nice to show anon usage stats there). Perhaps we could do the "every new session is a new anon user" approach, but have some special (simple) logic to filter these out in places like the coach reports, only showing something aggregate. We'll probably need special logic somewhere anyway (either in JS or API) to not have it show progress (e.g. a full streak bar) to a non-logged in user, though, as well -- to avoid confusion.

jamalex commented 10 years ago

A versioned "is_anonymous" field could be added to FacilityUser, and we could then disallow logins, showing of progress, etc, for those "users".

bcipolli commented 10 years ago

Cool! Yeah, I thought about these explicit options as well; my hesitation is that I worry special-case code will need to be wide-spread.

I've done an initial demo branch to explore simply setting user=None for the *Log objects. The code is relatively constrained (though some non-widespread use of collision to get anonymous users to be per-device instead of global).
https://github.com/bcipolli/ka-lite/compare/learningequality:develop...bcipolli:706?expand=1

bcipolli commented 10 years ago

I'm also still unsure about all of the facility stuff. Could make a facility just for anonymous users, but this would again require special-case code.

A solution that requires as little special-case code as possible is strongly preferred for me, even if the data collected are sub-optimal, as the complexity of this app is already pretty high.

jamalex commented 10 years ago

Yeah... (Facility could be the default facility, maybe..).

Perhaps the cleanest would be a new AnonymousUsage model, which would store aggregate stats (total hours watched, total answers given, etc, per video/exercise)? It wouldn't need to be associated with a user or facility, and wouldn't need to duplicate the logic from the ExerciseLog and VideoLog models, since it doesn't need to track points, etc. Then, the only custom logic anywhere would be either in the JS for saving progress (probably best, so it doesn't even show live point updating, etc), or in the log API calls. Just a thought.

jamalex commented 10 years ago

(wouldn't need to be multiple models, could just be one, with "kind" and "id" fields to identify videos vs exercises)

aronasorman commented 10 years ago

Hmm I don't see this essential for 0.11.1. Punting to 0.11.2.

bcipolli commented 10 years ago

@jamalex I am really warming up to the AnonymousUsage model. It's clean, it could follow the logic from the UserLogSummary functionality (for grouping with a particular granularity), and I think there's much more value currently of the overall usage than the per-exercise/per-video log data.

This would require us to implement the "one-way sync" (would be done soon anyway).

@aronasorman @rtibbles any thoughts / concerns?

For me, this is an essential piece to get out the door, to help us try and go from "installations" to "usage". I know this may wind up telling us nothing (because we'll truly never see offline installations again), but it seems like our best shot in the meantime.

jamalex commented 10 years ago

Cool! Note: I think there's still value in having AnonymousUsage be per-exercise/per-video (basically just a running total across all anon usage, per media entity), as it would be cool to be able to say what items people are watching/answering.

bcipolli commented 10 years ago

Agree; lots of possible ways to do this that are pretty generic and would generalize well to data collection in non-KA content scenarios.

rtibbles commented 10 years ago

Glad to hear you're on board for OneWaySync too! The anonymous usage also sounds helpful, and I can see the value in following the UserLogSummary model of aggregating over certain time windows.

On Thu, Feb 27, 2014 at 9:29 AM, Ben Cipollini notifications@github.comwrote:

Agree; lots of possible ways to do this that are pretty generic and would generalize well to data collection in non-KA content scenarios.

Reply to this email directly or view it on GitHubhttps://github.com/learningequality/ka-lite/issues/706#issuecomment-36267714 .

Richard

bcipolli commented 10 years ago

Tentatively assigning to myself; this is a ~1-2 hour project, once one-way sync (required) is done. Lower priority item.

bcipolli commented 10 years ago

Booting to others. Once OneWaySync is done, this should be really easy. I suggest moving this, along with the UserLog functionality, into a kalite.stats app. This will avoid any funky inter-app dependencies by shoving stats collection into other apps (like main or facility); instead, most apps can simply import kalite.stats... which makes lots of sense.

rtibbles commented 10 years ago

Sounds like a good design.

aronasorman commented 10 years ago

0.13.

aronasorman commented 9 years ago

I'm not up to date on where we are with regards to our data collection system. @rtibbles how close are we to implementing this in the current develop?

rtibbles commented 9 years ago

Way off.

On Tue, 26 May 2015 12:29 Aron Fyodor Asor notifications@github.com wrote:

I'm not up to date on where we are with regards to our data collection system. @rtibbles https://github.com/rtibbles how close are we to implementing this in the current develop?

— Reply to this email directly or view it on GitHub https://github.com/learningequality/ka-lite/issues/706#issuecomment-105642115 .

rtibbles commented 8 years ago

Not going to happen within the scope of KA Lite.