Gender Operator should support Unknown

outcomesinsights / conceptql

A high-level language that allows researchers to unambiguously define their research algorithms.

MIT License

17 stars 6 forks source link

Gender Operator should support Unknown #115

Closed aguynamedryan closed 6 years ago

aguynamedryan commented 7 years ago

We currently support male and female genders. We should support unknown as well, though what those concept_ids should be, I don't know.

Questions that need answering before I can implement this feature.

What are the concept_ids associated with "unknown" gender?
Should we include NULL gender_concept_ids?
Should we include the concept_id 0?

markdanese commented 7 years ago

Should these depend on the database? Do we need a sex vocabulary, mappings to our standard concepts, and concept ids and look this up for each dataset? Or are you asking what our standard concepts are? Like 01 = male, 02 = female, 03 = other/unknown?

aguynamedryan commented 7 years ago

At the moment, the gender operator looks at the gender_concept_id for the concept_ids of male/female.

Even under the GDM we're considering using concept_ids to represent gender, yes?

If so, what concept_ids in OHDSI represent "unknown" genders and do we allow 0 or NULL to represent "unknown" as well?

markdanese commented 7 years ago

are we assuming we are using OHDSI concept ids? If so, I guess we need to look that up in Atlas. Will check in a second.

NULL and 0 will represent unknown (it is "Other/Unknown" to capture all cases that are not male or female). My thought was that we would report out the cleaned values (male, female, other/unknown) as well as the raw values from the data (male, female, missing, martian, etc.).

markdanese commented 7 years ago

OHDSI vocab seems to have 5 values that seem ok.

8532    F   FEMALE
8507    M   MALE
8570    A   AMBIGUOUS
8521    O   OTHER
8551    U   UNKNOWN

aguynamedryan commented 7 years ago

So other than Male/Female, the rest of those concept_ids we'd consider to be "unknown".

So gender will (at the moment) support:

Gender	Concept ID(s)
Male	8507
Female	8532
Other/Unknown	NULL, 0, 8570, 8521, 8551

Actually, now that I type that out, I think we might instead define it as:

Gender	Concept ID(s)
Male	8507
Female	8532
Other/Unknown	IS NULL OR NOT IN (8507, 8532)

This issue is not concerned about what we'll report in the output (that can be a separate issue if you'd like) but instead what options we present to a user in the diagram editor and how those options behave when cutting the data.

markdanese commented 7 years ago

Yes -- I like the second one. And, just to be really clear, we are not handling gender in Jigsaw right now. I am drawing a distinction between biologic sex and gender identity. In the new UI, we are specifically calling it sex when we refer to male and female.

jenniferduryea commented 7 years ago

@aguynamedryan is there still a question here?

aguynamedryan commented 7 years ago

@jenniferduryea, support for Gender under GDM will require a bit more work. I'll create the appropriate tickets that must be closed before this ticket and can be finished.

justinlicitis commented 6 years ago

Tried a study with an index ICD9CM of 250.00 looking for only men, only women, women and unknown, and only unknown. Each export produced an identical cohort containing a mix of men and women (no gender_concept_ids other than 8507/8532 exist in synpuf). Tried removing the index algorithm and the resulting cohort produced more patients, but still a mix of both sexes.

jenniferduryea commented 6 years ago

the above comment by @justinlicitis is now addressed in https://github.com/outcomesinsights/t_shank/issues/57.

jenniferduryea commented 6 years ago

We need gdm db to test this against.

jenniferduryea commented 6 years ago

Currently the operator in the JAM only gives "male" and "female" options in the dropdown.

We need an option for "unknown". Sending back to programming.

jenniferduryea commented 6 years ago

confirmed that there is a dropdown for "Unknown" and that the sql generated is looking for all concept_ids not equal to 8507 and 8532, as proposed. Ran the sql in HUE against synpuf250 and came up with no one (which is correct). So I think this is good to go. Closing.