ubc / vizit2

Course Visualization for Instructional Teams (VizIT): An R package and Shiny app for visualizing data from student engagement in edX courses.
https://ubc.github.io/vizit2/
MIT License
1 stars 0 forks source link

Filtering activity level #1

Closed AndrewLim1990 closed 7 years ago

AndrewLim1990 commented 7 years ago

The numbers when filtering activity levels do not add up to the total when "All" is selected

davidklaing commented 7 years ago

I noticed this too! I think it's because there are a bunch of NAs. We should add either add NA to the menu, or filter those students out from the beginning.

AndrewLim1990 commented 7 years ago

Oh I see! I think that may be part of the issue. However I noticed the following:

(Number of students less than 30 mins) + (Number of students between 30 mins and 5 hrs) + (Number of students greater than 5 hrs) > All students

I might misunderstand but I think the NA problem would manifest in such a way where:

(Number of students less than 30 mins) + (Number of students between 30 mins and 5 hrs) + (Number of students greater than 5 hrs) < All students

Regardless, I'll take a closer look on what is going on for videos this weekend! Unfortunately, I think all dashboards are calculating their activity levels with separated functions. We may want to combine them eventually.

davidklaing commented 7 years ago

Oh! Ha. Yes, my NA explanation does not explain what you're observing. Just checked the code and found the problem in overview_engagement_server.R. In filter_demographics, the following code should have == rather than !=:

if (activity_level == "under_30_min") {
                filtered_df <- filtered_df %>% filter(activity_level == "under_30_min")
        }
        if (activity_level == "30_min_to_5_hr"){
                filtered_df <- filtered_df %>% filter(activity_level != "30_min_to_5_hr")  ### <- here
        }
        if (activity_level == "over_5_hr"){
                filtered_df <- filtered_df %>% filter(activity_level != "over_5_hr")  ### <- here
        }
davidklaing commented 7 years ago

I'm not sure which views are using which filter functions, but I wrote a separate one for the forum view since it has categories instead of modules. Do you mind investigating?

AndrewLim1990 commented 7 years ago

I will do it now!

AndrewLim1990 commented 7 years ago

Hmmm this doesn't seem to be problem for me:

filter_demographics <- function(input_df, gender = "all", activity_level = "all", mode = "all") {
  filtered_df <- input_df
  if (gender == "female") {
    filtered_df <- filtered_df %>% filter(gender == "f")
  }
  if (gender == "male") {
    filtered_df <- filtered_df %>% filter(gender == "m")
  }
  if (gender == "other") {
    filtered_df <- filtered_df %>% filter(gender == "o")
  }
  if (activity_level == "under_30_min") {
    filtered_df <- filtered_df %>% filter(activity_level == "under_30_min")
  }
  if (activity_level == "30_min_to_5_hr") {
    filtered_df <- filtered_df %>% filter(activity_level == "30_min_to_5_hr")
  }
  if (activity_level == "over_5_hr") {
    filtered_df <- filtered_df %>% filter(activity_level == "over_5_hr")
  }
  if (mode == "audit") {
    filtered_df <- filtered_df %>% filter(mode == "audit")
  }
  if (mode == "verified") {
    filtered_df <- filtered_df %>% filter(mode == "verified")
  }
  return(filtered_df)
}

Will take a closer look this weekend

idoroll commented 7 years ago

Two related issues:

ido

On Wed, Jun 28, 2017 at 5:43 PM, AndrewLim1990 notifications@github.com wrote:

Hmmm this doesn't seem to be problem for me:

filter_demographics <- function(input_df, gender = "all", activity_level = "all", mode = "all") { filtered_df <- input_df if (gender == "female") { filtered_df <- filtered_df %>% filter(gender == "f") } if (gender == "male") { filtered_df <- filtered_df %>% filter(gender == "m") } if (gender == "other") { filtered_df <- filtered_df %>% filter(gender == "o") } if (activity_level == "under_30_min") { filtered_df <- filtered_df %>% filter(activity_level == "under_30_min") } if (activity_level == "30_min_to_5_hr") { filtered_df <- filtered_df %>% filter(activity_level == "30_min_to_5_hr") } if (activity_level == "over_5_hr") { filtered_df <- filtered_df %>% filter(activity_level == "over_5_hr") } if (mode == "audit") { filtered_df <- filtered_df %>% filter(mode == "audit") } if (mode == "verified") { filtered_df <- filtered_df %>% filter(mode == "verified") } return(filtered_df) }

Will take a closer look this weekend

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AndrewLim1990/mooc_capstone_public/issues/1#issuecomment-311830574, or mute the thread https://github.com/notifications/unsubscribe-auth/AKEQMMi1nQQ9zPkBKLJ6KmacufUoIFW5ks5sIvMtgaJpZM4OIrLA .

AndrewLim1990 commented 7 years ago

Found the problem. There are two functions named filter_demographics. Both are purposed to do the same thing. I'm going to delete the one in @subizhangg 's overview_engagement_server.R if that is okay.

Let me know what you think @subizhangg

AndrewLim1990 commented 7 years ago

Regarding Ido's comment, you are right. The numbers at the top are the number of learners associated with that specific course element. For example, in the video dashboard, we would not be taking into account any learners that have never touched a video and only did problems. The same is true vice versa for the problems.

To me, this is preferable over trying to get all the numbers to be the same. For example, if an instructor has a course where most of the students don't engage with the videos and only did problems, it may be confusing to have them see a large number in the filtering panel yet see low numbers in the plots.

However, I can also see why the opposite is sometimes preferable.

If you feel strongly about it, perhaps we can make it a new request in a separate issue.

idoroll commented 7 years ago

I reply in a new issue

AndrewLim1990 commented 7 years ago

merged PR fixing filtering bug