openwebwork / webwork2

Course management front end for WeBWorK
http://webwork.maa.org/wiki/Main_Page
Other
141 stars 164 forks source link

student login activity graph #2365

Closed Alex-Jordan closed 3 months ago

Alex-Jordan commented 3 months ago

This implements a chart of logins when viewing a user's statistics page, like so:

Screenshot 2024-03-19 at 1 51 37 PM

This is meant to just get our feet wet with data visualization, and if this works out I hope it will grow to show things like:

with appropriate enhancements, like was the attempt successful, which problem number, etc. But that's for the future.

This uses R to do the data crunching. So you have to have $pg{specialPGEnvironmentVars}{Rserve} set in localOverrides.conf (should that actually be a site.conf setting?) If you are not using R, no problem. If $pg{specialPGEnvironmentVars}{Rserve} is unset, this will be skipped over. Otherwise, the page will take a little longer to load while the data is crunched, and you will see a picture like the screenshot.

Also for this to work you need to install some R packages that you might not already have. You can run R as root, and then do install.packages(c('ggplot2', 'svglite', 'fasttime')) to get them. (To exit R, run q().)

Before I went any further with learning enough R to do those other things I mentioned, I wanted to check in about the infrastructure. In particular this will be reading the log files, not accessing login records form the database (since we don't keep those). Does that raise any red flags? Later with answer submissions, I would probably use the course answer log too, unless there's a better way to access past answers from the database.

drgrice1 commented 3 months ago

This does not look very promising to me.

First, the R packages don't seem to want to install very easily. The "svglite" package failed to install the way you said to do it. It failed while trying to install dependencies for it. However, Ubuntu has packages for ggplot2 and svglite. They are r-cran-ggplot2 and r-cran-svglite. Those will probably only work if you have installed R and Rserve via the Ubuntu packages r-base-core and r-cran-serve. After installing that I was able to install the fasttime package from the R command line (there isn't an Ubuntu package for that). It turns out that is half a gigabyte of software. That is a huge dependency just to parse a log file, and generate an image.

Next, after getting all of that installed, then I went to the Stats page. Then I sat for a while waiting. Finally the page came up with the image. I tried it again and timed it. It took 24 seconds for the page to load. This is all on the local machine with a relatively fast computer. That is certainly not going to work.

I think that parsing the log files is fine, but I don't think that using R for this is the best way to achieve this end. Particularly when it comes with a 500 MB storage price tag, and significant delays in page loading.

drgrice1 commented 3 months ago

I should point out that this was for my local test course. The log file is 5MB in size. So quite large.

I tested with another local course that has a 3K log file. The page load was not nearly as significant, but still doubles the time it takes for the page to load compared to not loading the activity log image.

drgrice1 commented 3 months ago

I did manage to get the "svglite" package to install from the R command line interface. I had to install the "libfontconfig1-dev" Ubuntu package. I installed the packages you listed one at a time, and then was able to see the messages specifically for that package, and it actually told me to install that Ubuntu package.

I think for the log file sizes for a typical course, the load times will probably be okay. But if it is a large course, then later in the term as the log file grows, there could be issues.

somiaj commented 3 months ago

Thoughts on load time, think all the data/images could be pre parsed and generated via the job-queue, so they don't have to be generated each time the page is loaded? The instructor could configure how often the data is updated, and have a way to force an update?

Alex-Jordan commented 3 months ago

I'm sorry that way of installing packages didn't work. Of course where I was doing this, I installed them one at a time as needed them, like install.packages('ggplot2') and I trusted some documentation that said it would work that way.

For me it takes about half a second to load the page. The server is 2.8GHz with 7.8G RAM. It seems odd that R would take so long as 24 seconds for something so basic, when people use R for data visualization in much more complex applications. @somiaj did you try it out and does it also take so long?

One thing is that fasttime is not necessary. A post somewhere asserted that it sped up datetime processing by 100x, but I didn't test that.

Now that I'm aware of significant speed issues, some things occur to me combat that. I'll make some changes. And on top of those changes, if things are still slow, maybe the page should load without the graph, but there is button to generate the graph if you want it. I'm hoping that this can just be the start and more useful data visualization can come out of this. Also I did not look hard for a pure perl data visualization package, and maybe I should try that instead of R.

drgrice1 commented 3 months ago

I am not that concerned about the load time. I archived an active course from my production server and restored that archive on my local test server , and although It does seem to slow the page load down more than it should it does load in relatively quickly (just a second or two).

I am not particularly impressed with the resulting image though. Dots on a line don't seem particularly informative. For something like this I would prefer a table with actual login times listed. Then I can look at the table and see that the user logged in yesterday at 3:45 pm. Perhaps the granularity of the timeline axis could be improved some, but this is always the limitation of this kind of data visualization. I think this particular application may not be the best usage for this.

Alex-Jordan commented 3 months ago

Dots on a line don't seem particularly informative.

I agree, but that's not where this ends. I just wanted to get the basic functionality of R making a graph in there before I proceed. These will be useful 2D graphs as things progress. The PR helps me become aware of the potential for slowness that you are seeing, issues installing R on other OS's, etc.

I like the suggestion to have a table. I'm going to move this so that as much as reasonable, data about logins and answer submissions will first appear as a table. Then the user can click something to see a visualization if they want (which will take additional time).

drgrice1 commented 3 months ago

That makes sense. Can't wait to see what you develop then!

drgrice1 commented 3 months ago

I should also point out that on Ubuntu it is actually rather easy to get R set up with Rserv and the packages this pull request needs. Everything except fasttime is in an Ubuntu package. If you run sudo apt install r-base-core r-cran-rserve r-cran-ggplot2 r-cran-svglite, then R is setup. If you want fasttime, that isn't hard. You can do it your way or run sudo Rscript -e "install.packages('fasttime')" (which is basically your way but avoids needed to enter and exit the R command line interface and can be used in a script). It used to be more difficult to install R on Ubuntu in general.

pstaabp commented 3 months ago

Is there a reason you're using R for this? It seems like using some of the tools we already support is enough.

If it's the graphics, I've been working with jsxgraph and hopefully can get it integrated with @somiaj 's PGplot package. Or I would understand that some of the data analysis work, but looking at the code, I don't see much there.

Alex-Jordan commented 3 months ago

If something can replace R, that's great. It is clear that something could replace R for the current graph. But what I am envisioning will go beyond that timeline graph. Here is a crude layout I have considered. The vertical bars are logins.

Student So-And-So

        Sep 1        Sep 5        Sep 9        Sep 13        ....
--------------------------------------------------------
HW1   |                |     |
#1    |  000           |1    |
#2    |     01         |     |
---------------------------------------------------------
HW2   |                |     |
#1    |                | 1   |
#2    |                |     | 000

Not with |, 0, and 1 but rather appropriate graphical objects, color- and shape-coded. (With the parallel table to sidestep accessibility issues for instructors.) There may be additional data for each thing that is plotted, like for example was that login through LTI or with a password authentication? Maybe the main goal is to give typical instructors (who are unaware of the log files in the first place) a clear read on what happened when, filtering out lines form the log files that are not relevant. A table will help with that too.

From the chart a glance you would get a sense for this student's engagement with the course and the patterns they have for doing homework. You would also see things like a student logging in from a second device/browser while taking a test right away if this is done right.

Then there would be a similar presentation for each assignment, possibly with rows for each student (revealing when they always seem to be working together). The goal would be to show the instructor the patterns within each set where the students are struggling. The current bar graph does that too, but this would reveal some time-related patterns that the bar graph doesn't.

Once the visualizations are this complex, R already has tools for making these if you feed it a dataframe. And it would be flexible if we'd want to do things differently than that rough draft above. I looked a little for alternatives and found this but it's all still in early stages. Maybe there is something else out there... And if it was certain what the chart structure would be, this could just use primitive tools to make the charts. I'm leery to invest in that though if the chart structure might need to change.

some of the tools we already support

Do we not support R though? In theory PG problems can use it if the server is configured for that.

Alex-Jordan commented 3 months ago

I'm going to close this one. If this ends up using a lightweight tool to make any visualizations, there's nothing here that is helpful. See #2369 for the status of this project.