Update Nov 9 - Githubissues

markoprodanovic commented 4 years ago

I’ve managed to create two data tables to help us answer questions about viewership in Panopto

TABLE 1: UNIQUE VIEWERSHIP ACROSS CHUNKS

Example Output (COMM 290) => Unique Viewers: 272

Creating this table turned out to be more challenging than expected:

For the purposes of not forgetting what we did, and transparency about how we calculate this, the steps are:

make a call to the REST API for the session -> pull out the duration
use this duration to calculate the 5% chunk size (duration / 20) and break up the timeline into chunks of that size
then make a call to the SOAP API for session viewing data
Go through the data and filter down to a unique list of user ids
For each unique user, go through all of their records and calculate coverage (ie. list of tuples with start and end times that show what parts of the timeline they’ve watched) - this was the challenging part!
Once coverage is calculated, compare user’s coverage to every chunk and increase the unique_views tally if a user has watched 90% or more of that chunk (note this isn’t 90% of the /duration/ of that chunk - this is unique coverage - ie. 90% of that footage has been watched so if I watch the same 5 seconds over and over again my coverage doesn’t increase)

⚠️ Note that the date range that is shown in the data can be adjusted by narrowing the start and end time in the SOAP call. So you can answer questions like, for example, “how many students watched this video between the first and second midterm?”

I tried to play around with including dates somehow as values in the data but they didn’t really make sense to me. Even if a chunk was completed, a user could’ve watched it over multiple times/days. Users can also rewatch chunks, so which date counts?

when they first watched some of a given chunk?
when they first watched all of a given chunk?
the most recent time they watched a chunk?
what if a student watches 78% of a chunk one day and 5% another day and 7% another day. Which day did they watch it? 😰

To me it made the most sense to just say, we can narrow dates by adjusting the call we make so that the raw data that we begin to work with is already filtered to those dates (although I realize this could make it difficult to adjust these via parameters in Tableau )

TABLE 2: CHUNK VIEWERSHIP PER USER

every row represents a unique user
there is a column for total_view_time - which is the total amount of time spent watching the video
there is a column for each chunk - the value of which represents the TOTAL AMOUNT OF TIME spent viewing that chunk (this is no longer unique viewership but TOTAL)
from this we can total/average chunks to get a sense of
- which chunks have been watched for the most amount of time
- what is the average amount of time spent per chunk

For example, in analyzing the data above (excel)...

averages in yellow totals in green

As we can see, chunk 3 has notably higher total and average viewership than its neighbours. Also, looking at the first table shows that it has a lot of unique viewership as well

...and indeed going into the content of the video, this section is a walkthrough of a problem solution - so it makes sense that viewership would be more concentrated

Chunks 9-12 also look interesting - in context of the video it's yet another walkthrough of a problem solution followed by a steep dropoff in chunk 13, when the walkthrough is finished, all math disappears from the slides in favour of images

Chunks 17-19 see a pretty major drop-off both in unique viewership as well as totals/averages - and in the context of the video, this is when the instructor ends their slides and the main lecture

In my limited experimenting with COMM290, I found that if a chunk has some combination of:

a higher number of unique viewers for that chunk
a higher total time a chunk has been watched
a higher average time a chunk has been watched

… it tends to be indicative of a more-engaged-with part of the video. (usually the solutions to an example problem). Pretty cool!

markoprodanovic commented 4 years ago

@alisonmyers

Putting this up here to show you how things are progressing, but also as a means of documenting.

Will come prepared to speak on all of this during tomorrow's meeting ☺️

markoprodanovic commented 4 years ago

Thinking back to some of Rajesh's questions about async sessions, I believe that they can be answered in full using the data in these two generated tables.

1. What percentage of the class watched the video? => We can tell you the unique number of users who've accessed the video. Compare this with your class size and you'll have a sense of what percentage of students watched it.

2. What percentage of the video has the class watched? => For every 5% chunk of the video, the data tells us how many unique users watched it (we currently defined "watched" as having viewed >= 90% of it)

3. Which part of the video did students visit again and again => For every 5% chunk we can see who watched it and how much time they spent there. If we average or sum across user activity in chunks we can get a clearer sense of which parts of the video are users watching most and spending the most time in

alisonmyers commented 4 years ago

For Chunking by Dates, consider the following: Try to think about what the data would look like at the individual student level, and how we would want to "roll" that up. We want our data extraction to create entire sets for now, and leave the filtering to a user filter (in Tableau or other).

Consider a scenario like this

on day 1 (Nov 1) a student watches the first half of the video once, and rewatches one section, so for that session and user you could have

Edit - I just realized this was considering 10% chunks, but I think the example still stands

date, chunk_id, number_of_watches
2020-11-01, 1, 1
2020-11-01, 2, 2
2020-11-01, 3, 1
2020-11-01, 4, 1
2020-11-01, 5, 1

then on day 2 they watch the whole video

2020-11-02, 1, 1
2020-11-02, 2, 1
2020-11-02, 3, 1
2020-11-02, 4, 1
2020-11-02, 5, 1
2020-11-02, 6, 1
2020-11-02, 7, 1
2020-11-02, 8, 1
2020-11-02, 9, 1

So, for this video, perhaps this was the only student, you could aggregate by chunk and count the total views, and the unique users per chunk

chunk, users, n_watches
1, 1, 2
2, 1, 3
3, 1, 2
4, 1, 2
5, 1, 2
6, 1, 2
7, 1, 2
8, 1, 2
9, 1, 2

markoprodanovic commented 4 years ago

Hmm, here's a scenario I'd still be worried about.

Let's say a student bounces around the timeline within the first 3 chunks on Nov. 1:

They watch this much of the chunks:

0 - 80% 1 - 40% 2 - 60%

...they've completed no chunks therefore this is there data for that day

date, chunk_index, number_of_watches
2020-11-01, 0, 0
2020-11-01, 1, 0
2020-11-01, 2, 0

The student then access the data on Nov. 3 and goes back to fix the missing gaps in their viewing -- they watch the remaining parts of each chunk:

0 - 20% 1 - 60% 2 - 40%

Like before, on this day, they completed no chunks therefore their data looks like this:

date, chunk_index, number_of_watches
2020-11-03, 0, 0
2020-11-03, 1, 0
2020-11-03, 2, 0

Now we have a situation where a student has finished all 3 chunks but we have no record of completion because each row is a days worth of data.

alisonmyers commented 4 years ago

I think by chunking, we can lose some noise of how much of a chunk was watched, or else we are back to caring about minute by minute activity - which would require a different kind of dataset. So, "watching a chunk" we can decide what this means

I.e. maybe "Watching a chunk" means they watched at least 10% of that chunk in one go to count as a chunk-watch.

(We can do some exploratory analysis to see what makes sense).

markoprodanovic commented 4 years ago

I'll think on this a bit more! Good thing to talk through during our meeting.

It's an interesting problem because there's some subjectivity needed - ie. "how do we define completion of a chunk"

And this decision has huge impact on how the data looks.

For fun, here's what the unique view count looks like at chunk completion >= 10% for the same video as above.

Notice how much chunk 1 changes (difference of 115 viewers) Notice how little chunk 3 changes in comparison (difference of 16 viewers)

With this more liberal criterial, the table shows me that lots of people actually did watch chunk 1 The earlier screenshot has stricter completion criteria, but maybe is more useful in the sense that it shows us that more people "meaningfully" engaged with the material at chunk 3

alisonmyers commented 4 years ago

Definitely. I think if we start looking at individual patterns of activity it will tell us something more meaningful about how to define chunks. I.e) if we find students jump around a lot, and watch short bursts, we might want to be more forgiving about "Watching a chunk". If we find that students watch straight through, then we don't need to worry as much.

markoprodanovic commented 3 years ago

archived

saud-learning-services / panopto-video-analytics

Update Nov 9 #4