Closed zqian closed 1 month ago
If/when we go to BIgQuery we could consider switching to access the CD2 (canvas table) directly. This could more directly fix issues like #1559 where the context_store doesn't keep pseudonyms for all users. It would also bring up the question if we should use the context_store or just go to the canvas table directly? Are there advantages for MyLA to use the context_store?
This would put us back closer to the UDW and I think eliminate some other bugs we had to try to workaround with these UDP queries.
I think if we don't do a full switch for some tables like the users table we should use the data from the Canvas tables where it's possible at least replacing things like entity.person_email
@zqian I'm not sure if you started on this but I started working on this today.
I'm wondering if there's value in leaving the and PostGres code in there anymore since this is all using UDP anyway. I think we could write code so both work, at least for now, and remove that later. But we'd be leaving in dead code and probably nobody would be using it anyway. I feel like we're in the "Unizin 100% required" phase for this project now.
I think this still will need need sqlalchemy for writing to the MySQL database and removing that if we wanted to do that would be a different task.
@jonespm I took my name from the assignee list. Please go ahead update the title and description of this issue, since the current design is to use Canvas Data 2 hosted in BigQuery.
Describe your problem or feature you'd like added
MyLA cron.py queries UDP context_store for Canvas data and UDP BigQuery for Caliper events.
Describe the solution you'd like
The UDP context_store tables can now be queried from UDP BigQuery. We can update MyLA cron.py file, and remove the SQLAlchemy and psycopg libraries.