lookit / lookit-api

Codebase for Lookit v2 and Experimenter v2. Includes an API. Docs: http://lookit.readthedocs.io/
https://lookit.mit.edu/
MIT License
10 stars 18 forks source link

Show summary stats of users/participants on Experimenter #134

Closed kimberscott closed 5 years ago

kimberscott commented 6 years ago

Pain point: Upon launch, we will need to work to increase the Lookit userbase via outreach and advertising, but we don't currently have a way to evaluate such efforts (e.g., to see how many people registered in the past week).

Acceptance criteria: The family outreach specialist can easily monitor and evaluate advertising efforts by answering questions like the following, using the Lookit admin or experimenter interface without doing any programming:

Implementation notes/Suggestions: This can possibly be part of either the experimenter or the admin apps in Django. It seems like it might build on existing functionality in admin, except that we don't want usage to be limited to people who are actually admins (able to see/manipulate all data).

We've discussed building a dashboard and essentially fetching a bunch of data, then allowing filtering down from that using sliders/etc. (e.g. for age range, demographics). It could show things like new participants registered per week, a bar chart of the age distribution, tables of demographic form responses, and a plot of # unique study participants / week (one line for total unique participants, lines for individual studies).

It might turn out that there's nothing preventing us from allowing all researchers to use this from an ethics/privacy standpoint (if there's no way for them to get identifying info, just composite stats we could share with them anyway), which would be great someday, BUT the primary intended users are still a couple people at MIT for the purposes of whether we need to engineer database access based on many users.

kimberscott commented 5 years ago

Minor addition that would be helpful under same goal: allow experimenters to see "date created" of account model in addition to "last active," so that they can at least crudely tell the difference between existing participants and people who created accounts in response to their recruitment efforts

kimberscott commented 5 years ago

Rough sketch of what this might look like. This is essentially a big wish list that I expect to be pared down! Participant Stats

kimberscott commented 5 years ago

Permissions, per discussion just now:

kimberscott commented 5 years ago

This is an amazing tool and I'm very excited about how powerful the pivot table approach is! I'm eager to get this through to production because I actually want to use it, having never had the time to go through this sort of info in much detail.

Nitpicking as always:

Various requests for clarifying text:

Datamance commented 5 years ago

Going to leave comments as I work through these:

For item, "I'm not seeing any data in "Cumulative Registrations" locally - is that just me? Can you show how it looks with more data?"

image

That's what mine looks like - yours could be not rendering because you don't have multiple users on your local instance.

kimberscott commented 5 years ago

Hmm, it looks like I have 6 users, 8 kids registered on my local instance but only one that's really participated. Is the number of registrations pulled from the same set of responses as everything else? Is there any way to get actual total registrations here?

Datamance commented 5 years ago

Real quick just going to address this:

Is it possible to allow filtering of responses in pivot table / multi-value field breakdowns - e.g., only include people who have participated in X study / from users who have logged in within past year / who registered in certain time window?

image

Using the arrows on the sidebar variables, you can restrict values. Time filtering would require a little ~more work~ hacking to wire the date filter (or a new one) to the underlying data set.

kimberscott commented 5 years ago

Does this end up restricting values in the multi-value field breakdowns as well? If not it might be worth doing that and possibly doing the hacking to use the date filter - the use case I'm thinking of is looking at the "how did you hear about Lookit" field for active or recently-recruited participants.

Datamance commented 5 years ago

Does this end up restricting values in the multi-value field breakdowns as well? If not it might be worth doing that and possibly doing the hacking to use the date filter - the use case I'm thinking of is looking at the "how did you hear about Lookit" field for active or recently-recruited participants.

It doesn't - the multi-value field breakdowns are predicated on the children (and probably should be called as such - something like "Child characteristics"). Right now, what I have served up into the template context is this (skipping a few intermediate lines of code):

        children_queryset = Child.objects.filter(
            id__in=annotated_responses.values_list("child", flat=True).distinct()
        )
        children_pivot_data = unstack_children(children_queryset, studies_for_child)

        ...

        ctx["studies"], ctx["languages"], ctx["characteristics"] = [
            dict(counter) for counter in children_pivot_data
        ]

So the data structures are pretty much hardcoded to give a count for all responses.

I could change this much in the way that we've been changing everything else in this view - basically defer the calculations to the browser, and instead of passing three dictionaries of counts into the template context, pass in a JSON blob of children (keys would be child UUIDs and values would be JS objects containing the lists of characteristics/languages/studies for those children).

This way, whenever we produce a new set of timeseries data and unique child IDs as a byproduct, we can use those IDs to key into that "child info" object and do the counts on the fly.

kimberscott commented 5 years ago

Hmm I guess I was conflating the multi-value field breakdowns and the free-response answer displays. I'm ok leaving the multi-value field breakdowns as is (as long as they're labeled so we know what's being counted where) but would like to be able to filter the "additional info" & "how did you hear about Lookit" fields.

Datamance commented 5 years ago

Point of curiosity only: what do the little arrows next to the dependent variable dropdown (e.g. "Total # responses") do?

¯\(ツ)/¯ it looks like they change the order?

Datamance commented 5 years ago

Indicate somewhere which families/children/responses are being included

Can you clarify this a bit? Do you mean including User and Child UUIDs somewhere?

Need a very small amount of explanatory text about the pivot table, even just that this is how you can generate various summaries of participant characteristics, broken down by fields you can choose.

Do we need another docs page for the pivot table?

I should provide some text to go at the top about the purpose of this view & some cautions about how the data can be used/shared (e.g. a reminder that demographic data may not be shared in a way that allows linking it to video, and an example of how that could happen without publishing an ID)

Sure thing - let me know when you have the language ready and I'll paste it in.

kimberscott commented 5 years ago
Indicate somewhere which families/children/responses are being included

Can you clarify this a bit? Do you mean including User and Child UUIDs somewhere?

Ah, sorry, I mean how are they being selected. So e.g. in the pivot table - only responses from studies you have read access to and where consent has been confirmed and children/accounts associated with those responses. (This may vary by superuser status?)

Need a very small amount of explanatory text about the pivot table, even just that this is how you can generate various summaries of participant characteristics, broken down by fields you can choose.

Do we need another docs page for the pivot table?

That's a good idea at some point, but for now really just 1-2 sentence statement of what this thing is.

Sure thing - let me know when you have the language ready and I'll paste it in.

Will do - can have tomorrow!

kimberscott commented 5 years ago

Proposed language for at the top:

The information on this page is provided primarily for the purposes of evaluating your recruitment efforts: how well do various approaches work? What populations do they reach? You may also find it helpful for reporting aggregate characteristics of your participants. Please note that demographic survey data may never be published such that it could be linked to an individual participant's video (see Terms of Use). Before sharing any demographic data, consider whether it might be possible to link to individual participants: e.g., because a child's name is mentioned in a comment or because only one family speaks a particular language and that language is used in their video.

kimberscott commented 5 years ago

This is so cool I hate to do more nitpicking because really I just want it up on prod, but... here are my nitpickings from looking at it on staging:

Explanatory text:

Pivot table:

Datamance commented 5 years ago

does being a superuser affect what data is included in participation, registration, pivot table, and/or child characteristics? If so include note to that effect somewhere. (Can be shown only if user is superuser, or shown regardless.)

It does, I just had the wrong text in the wrong conditional block. You'll see that you get data for all responses (with the added restriction of consented only for the pivot table) when you're a superuser.

Datamance commented 5 years ago

The charts for "unique families", "unique children," and "child age..." aggregators would be more helpful with y axis labels I'm seeing them on mine - image

Are they just not rendering on yours?

Datamance commented 5 years ago

Inside global data filters (if it applies to the pivot table/child characteristics) or inside the pivot table, can we have some way to select "active users" - e.g. people who have logged in w/i past year? (We have last login timestamp on the account.) Would be helpful for evaluating overall characteristics of participant userbase - right now we're mixing in a lot of accounts moved over from the old platform where people have never logged in and likely never will.

I'm a little confused - if a participant has had a session in some time window, then they have also logged in, no? Put in terms of the contrapositive: If a user hasn't logged in, they haven't had a session, which means they wouldn't be in the dataset to begin with. Unless I'm missing something?

kimberscott commented 5 years ago

Ohhh that's fair, sorry. It's possible to have logged in but not participated, but probably uncommon and the distinction isn't too important - it would indeed work just as well to select for having participated in a study in the past year. So nevermind!

kimberscott commented 5 years ago

Sorry, I meant y axis tick labels - numbers for the horizontal lines. I do see the labels like "Unique families."

Datamance commented 5 years ago

It looks like this is a bug with google charts https://github.com/google/google-visualization-issues/issues/2693

I'm going to look for workarounds and let you know what I find.

Datamance commented 5 years ago

It does look like we'll have to revert to an old version (45) in order to have this work properly https://github.com/nicolaskruchten/pivottable/issues/1082

Let's hope it doesn't affect too much else!

Datamance commented 5 years ago

Closing this beast.