tl-its-umich-edu / my-learning-analytics

My Learning Analytics (MyLA)
Apache License 2.0
36 stars 39 forks source link

change Postgres cron queries to BigQuery quries #1557

Closed zqian closed 1 month ago

zqian commented 6 months ago

Describe your problem or feature you'd like added

MyLA cron.py queries UDP context_store for Canvas data and UDP BigQuery for Caliper events.

Describe the solution you'd like

The UDP context_store tables can now be queried from UDP BigQuery. We can update MyLA cron.py file, and remove the SQLAlchemy and psycopg libraries.

jonespm commented 5 months ago

If/when we go to BIgQuery we could consider switching to access the CD2 (canvas table) directly. This could more directly fix issues like #1559 where the context_store doesn't keep pseudonyms for all users. It would also bring up the question if we should use the context_store or just go to the canvas table directly? Are there advantages for MyLA to use the context_store?

This would put us back closer to the UDW and I think eliminate some other bugs we had to try to workaround with these UDP queries.

I think if we don't do a full switch for some tables like the users table we should use the data from the Canvas tables where it's possible at least replacing things like entity.person_email

jonespm commented 3 months ago

@zqian I'm not sure if you started on this but I started working on this today.

I'm wondering if there's value in leaving the and PostGres code in there anymore since this is all using UDP anyway. I think we could write code so both work, at least for now, and remove that later. But we'd be leaving in dead code and probably nobody would be using it anyway. I feel like we're in the "Unizin 100% required" phase for this project now.

I think this still will need need sqlalchemy for writing to the MySQL database and removing that if we wanted to do that would be a different task.

zqian commented 3 months ago

@jonespm I took my name from the assignee list. Please go ahead update the title and description of this issue, since the current design is to use Canvas Data 2 hosted in BigQuery.