openedx / aspects-dbt

The dbt project for Open edX Aspects!
Apache License 2.0
2 stars 5 forks source link

enrollments_by_day does not include the current day #22

Closed bmtcril closed 1 year ago

bmtcril commented 1 year ago

I would expect there to be rows returned for the following query, but instead it is 0.

select count(*)
from reporting.enrollments_by_day
where enrollment_status_date = today()
and course_key = 'course-v1:UMONTREAL+GPT101+2023_04'
and enrollment_status = 'registered'
;

I'm able to get the expected rows by changing int_enrollment_windows window_end_date to:

dateTrunc('day', coalesce(window_end_at, today() + toIntervalDay(1))) AS window_end_date, but I'm not sure what the impacts of that change are. I think this is just a "midnight time boundary" issue?

bmtcril commented 1 year ago

@SoryRawyer heads up on this one, can you take a look?

SoryRawyer commented 1 year ago

I think this might be an artifact of dbt not cleaning up when models are removed. The enrollments_by_day query was moved into a Superset virtual dataset since it uses window functions and therefore would not utilize indexes when queried if created as a view by dbt. I think the issue of excluding the current day in the window query was addressed in this PR: https://github.com/openedx/tutor-contrib-aspects/pull/260

Are you seeing that the current day is still excluded in the instructor dashboard?

bmtcril commented 1 year ago

Hmm no, I was trying to answer a question for @Ian2012 about where to query enrollment information (total learners ever enrolled, including inactive and total active enrollments) but must have had old views. So for this information he should look to the virtual dataset for prior art?

SoryRawyer commented 1 year ago

I think I'd err on the side of using reporting.fact_enrollments when possible, but I think the enrollments_by_day virtual dataset is going to be the best way to answer questions like "how many learners were enrolled on a specific day?". Unfortunately, querying virtual datasets isn't the best experience, since you need to use the dataset macro and you need to pass the integer ID of the dataset, so it'd look something like this:

select count(*)
from {{ dataset(40) }}
where enrollment_status_date = '2023-05-01'

Now that we're using dictionaries to join the entity names, I think the last ClickHouse issue that's keeping us from moving this into dbt is the fact that views containing window queries don't use any indexes. I haven't looked at the upcoming ClickHouse improvements, so I'm not sure if that's on the horizon or if we need to think of more permanent guidance for where to query.

SoryRawyer commented 1 year ago

Would it be worth creating something like a "cookbook" for common query patterns?

Ian2012 commented 1 year ago

We can also create a custom macro that returns the ID corresponding for a given UUID. Not sure if that's suitable, but worth the try

bmtcril commented 1 year ago

I think it's probably best to hold off a little longer until we can make a call on RBAC. If we can get 90% of this in dbt and just need to add in filters it will make things a lot easier.

bmtcril commented 1 year ago

This has been fixed for a while 👍