tl-its-umich-edu / my-learning-analytics

My Learning Analytics (MyLA)
Apache License 2.0
36 stars 39 forks source link

improve cron eventstore queries #856

Open zqian opened 4 years ago

zqian commented 4 years ago

855 Expected behavior (A clear and concise description of what you expected to happen) :

Describe the bug (Tell us what happens instead of the expected behavior) :

The long query string for BigQuery could be improved

Matt suggested following in issue #848: "I think these queries can be simplified in other ways though by pulling the patterns out of them but that's a fix I haven't gotten to yet.

For example:

SELECT 'canvas' AS resource_type, CAST(SUBSTR(JSON_EXTRACT_SCALAR(event, '$.object.id'), 35) AS STRING) AS resource_id, CAST(SUBSTR(JSON_EXTRACT_SCALAR(event, '$.membership.member.id'), 29) AS INT64) AS user_id, CAST(SUBSTR(JSON_EXTRACT_SCALAR(event, '$.group.id'),31) AS INT64) AS course_id, COALESCE(JSON_EXTRACT_SCALAR(event, '$.object.extensions[\'com.instructure.canvas\'].asset_name'), JSON_EXTRACT_SCALAR(event, '$.object.extensions[\'com.instructure.canvas\'].filename'), JSON_EXTRACT_SCALAR(event, '$.object.name'), 'attachment') as name, datetime(EVENT_TIME) as access_time` Could probably be something like

resource_type: "canvas", resource_id: "SUBSTR(JSON_EXTRACT_SCALAR(event, '$.object.id'), 35)", user_id: "SUBSTR(JSON_EXTRACT_SCALAR(event, '$.membership.member.id'), 29)", course_id: "SUBSTR(JSON_EXTRACT_SCALAR(event, '$.group.id'),31)", This could probably be improved further since many of these are pulling substrings out of the json event so that could be special too.

We could probably improve on all other parts of this query too by finding and replacing the patterns, or using some "methods" to do that. I really don't like how this is either but this is the real query. ;)"

lsloan commented 4 years ago

A problem with this approach is that not all Caliper events contain the same data.

For example, MiVideo events sent by Kaltura do not include users' Canvas user ID number. Users' ID numbers can be found in other events using these query fragments:

But that's not possible with MiVideo events. The closest data those events have to identifying the user is the login name, which is part of the actor ID (currently, the MiVideo events are not slim), retrieved this with segment:

replace(JSON_EXTRACT_SCALAR(event, '$.actor.id'), 'https://aakaf.mivideo.it.umich.edu/caliper/info/user/', '') AS user_login_name

New code is added (in #826) to the cron process to convert user_login_name to user_id using data from MyLA's user table.