outcomesinsights / conceptql

A high-level language that allows researchers to unambiguously define their research algorithms.
MIT License
17 stars 6 forks source link

Add parameters to Nth Occurrence to find rolling 4th event within a time period #165

Closed jenniferduryea closed 6 years ago

jenniferduryea commented 6 years ago

Taking the use case from #160 , instead of adding that feature into the AFTER operator, it seems better fit to add this functionality into the Nth Occurrence operator. Please view #160 for the actual use case and what is correct behavior. In summary, the use case is trying to find the 4th event within a time frame, and have that time frame reoccur or "roll" along - similar to recurrent event studies.

For documentation purposes, I'm going to explain how this operator works (in summary).

The nth occurrence operator will find the nth occurrence, which will return one qualifying event per parameter. The occurrence count starts at using the reference/first event as "1".

  1. The default settings will produce one event that is the nth occurrence in that stream. The data is in ascending order by start_date. If there are two events on the same day where the nth occurrence is one of these days, Jigsaw will output only one of the events.
  2. The default settings where "group by date" is checked will count events by day. So if there are multiple events on one day, that will count as only one occurrence. And only one event from that day will be output.
  3. The default settings with a WITHIN parameter set will look at all upstream events and count the number of occurrences within the time window created by the reference event. All events will be evaluated with this time window. For example, if WITHIN = 60d and OCCURRENCE = 3, the operator will look at the first upstream event, create a time window from that first event, and count the number of events that fall in that time window - where the first event is counted as "1" and 60d are added to the end date of the first event. Then, once the first event is evaluated, the operator will move to the second upstream event, and do the same evaluation. The operator will output the nth occurrence for every time window created by every upstream event. So the user will get multiple nth events for every window generated.
  4. The default settings with an AT_LEAST parameter set will look at the first upstream event and output one nth event, counting the events that are over the number of days indicated in the AT_LEAST parameter. This will always be evaluated by the first event. For example, if AT_LEAST = 30d and OCCURRENCE = 3 will output the third event, counting from 1 (the first upstream event) and 2 (the first event that is dated over 30 days from the first event "1").
jenniferduryea commented 6 years ago

@jeremyevans has done some updates to the operator - where he added the fields (in screenshot):

image

His commits, with notes, are noted here: https://github.com/outcomesinsights/conceptql/pull/163

For documentation:

jenniferduryea commented 6 years ago

Ran a test on broom.jsaw.io and found the export failed with the following error:

ERROR -- : Impala::Protocol::Beeswax::BeeswaxException: AnalysisException: Column/field reference is ambiguous: 'person_id'

Sending back to @aguynamedryan for help.

jenniferduryea commented 6 years ago

Per @aguynamedryan request, here is the JSON for the above test case:

image

aguynamedryan commented 6 years ago

@jenniferduryea, could you please cut and paste the text itself so we don't have to re-type the algorithm?

jenniferduryea commented 6 years ago

@aguynamedryan here you go:
[["occurrence",4,["icd9","355.8","401.0","401.9","250.00","780.93","459.81"],{"within":"60d"}]]

jenniferduryea commented 6 years ago

My test patient (person_id = 2) is not showing up in my test I posted above when run against synpuf_250. I looked at the output and it looks like this is not including all results. Tagging @jeremyevans for help.

jenniferduryea commented 6 years ago

My original test (find fourth dxs within 60d of each other will result with patient 2, index 4/15/09) passed.

However, I'm now testing the at_least parameter (find 2nd dxs at_least 30d will result in patient 2 with 3/30/09 event) and I'm getting 0 results. I ran the SQL code through HUE (against synpuf_250 directly) and got 0 results. This seems like a bug. Sending back to @jeremyevans for help.

For reference, here is the conceptql code I'm using: [["occurrence",2,["icd9","355.8","401.0","401.9","250.00","780.93","459.81"],{"within":"","at_least":"30d"}]]

The test should include patient 2 with an event at 3/30/09 since the first event (10/4/08) is greater than 30d apart.

jenniferduryea commented 6 years ago

I have updated the original ticket description to document how we think the nth occurrence operator is working. @jeremyevans please review the original ticket description about AT_LEAST to make sure we are all on the same page. Thank you.

jeremyevans commented 6 years ago

I think the issue is that empty within or at_least options are not handled the way you expect. Looks like you want to ignore the value if it is empty. I think the pull request just referenced will fix the issue.

jenniferduryea commented 6 years ago

Tested using all parameters (at_least, within, occurrences, group_by_day) and they all work as expected. Thank you so much @jeremyevans !

jenniferduryea commented 6 years ago

The following tests were created to test the functionality of this operator: nth_occurrence_test nth_occurrence_test_as_2out nth_occurrence_test_as_2out_2d_atleast_30d_within nth_occurrence_test_as_2out_8th_within_365d_group_by_day nth_occurrence_test_as_2out_190d