hubzero / hubzero-cms

Platform for Scientific Collaboration
https://hubzero.org
GNU General Public License v2.0
47 stars 57 forks source link

[NCN-613] Fix user ids associated with course asset views #1679

Closed jsperhac closed 11 months ago

jsperhac commented 11 months ago

Summary

This PR fixes a bug in com_courses that associated the course member id, rather than the user id, with student access to course assets.

Because we have course member data, user data, and the course asset views table, we can also fix the historical user ids in the table.

Motivation

Odd behavior was noted by @jsperhac before this bug was reported. Queries against the course asset views were not returning the expected counts of records.

Joe Cychosz reported the issue in detail, here: Nanohub ticket 442491; see also the linked Jira card, NCN-613.

Code description

The fix involves saving the jos_users.id rather than the jos_course_members.id into the table that logs student accesses to course assets (quizzes, lectures, etc.). That table and column is: jos_courses_asset_views.viewed_by.

The code change to save the correct id is found in components/com_courses/tables/asset.views.php::logView().

In addition to making the code change, we repair the existing course asset view data. The course member table contains both a primary key we were erroneously logging, and the user_id we should be logging. The included migration script addresses this database fix (see involved database test against the dev instance, shown below).

Because I'm paranoid, I'm also retaining the original member ids, and logging them in the code, in a new column called jos_courses_asset_views.viewed_by_member.

Testing

The migration script was run on the jsperhac AWS instance.

Database testing

The database manipulation portion of this fix was tested thoroughly on nanohub dev using a test table created as a copy of jos_courses_asset_views, and running the needed SQL by hand as follows:

  1. Copy jos_courses_asset_views to table test_asset_views. All further testing will use this test table.

create table test_asset_views as select * from jos_courses_asset_views

  1. Create viewed_by_member column in test_asset_views, copy the viewed_by column contents into it:

alter table test_asset_views add column `viewed_by_member` int(11) NOT NULL;

Copy in contents of viewed_by:

UPDATE test_asset_views,
       (SELECT id, viewed_by FROM test_asset_views)
       AS sub
    SET test_asset_views.viewed_by_member = sub.viewed_by
    WHERE 
    test_asset_views.id = sub.id;
  1. Verify contents of viewed_by and viewed_by members are equal for all rows. Spot checks, in addition to:
select sum(s.id - t.viewed_by_member) from test_asset_views t, jos_courses_members s where t.viewed_by_member=s.id;
+--------------------------------+
| sum(s.id - t.viewed_by_member) |
+--------------------------------+
|                              0 |
+--------------------------------+
1 row in set (0.13 sec)
  1. Update viewed_by to the correct user id for all rows. Use correlated subquery:
UPDATE test_asset_views t
        SET viewed_by = (select sub.user_id from jos_courses_members sub
            WHERE 
            t.viewed_by_member = sub.id);

Evaluate result:

select sum(t.viewed_by - s.user_id), sum(t.viewed_by_member - s.id) 
from test_asset_views t, jos_courses_members s where t.viewed_by_member=s.id;
+------------------------------+--------------------------------+
| sum(t.viewed_by - s.user_id) | sum(t.viewed_by_member - s.id) |
+------------------------------+--------------------------------+
|                            0 |                              0 |
+------------------------------+--------------------------------+

Deployment

This fix is critical for generation of datasets for the MEST project.

It should be applied as a hotfix to production (database first, code second).

Questions

So what was the damage, and does this solve all our problems? We have been storing the wrong user id in the course asset views table. When we try to assemble a list of all the users that have accessed course assets, and the count of times they have done so for some course, we have come up with surprisingly paltry lists, even for courses with robust enrollment. We have been assuming the viewed_by column referred back to jos_users.id. Surely our lists of user identities are completely incorrect for the output of these queries. I feel I need to think about whether there are some accesses we are missing (due to missing member records?) or other repercussions.

One conclusion I have from reviewing the PHP code for this logging is that we log accesses to course assets by registered members ONLY; there are no casual/unregistered visits logged there.

Revisions

To be determined.

jsperhac commented 10 months ago

Note: closed this PR because the proper join here is:

jos_course_asset_views.viewed_by = jos_course_members.id

Consider that we require a distinct, unique course member id for each course that a given user has registered for.