Closed znatty22 closed 1 year ago
From last week's discussion, change ready_for_release
to staged_for_release
Maybe something like embargo_hold
or pending_release
: visibility is set to false due to 6 month embargo before data is released on portal
Can we add enums for these 3 reasons:
thoughts?
How about- nautilus_issue
/ missing_from_nautilus
/ or ? This is relevant for the weekly CBTN refresh. Samples that are no longer available in Nautilus will change from visible to hidden.
This is what we have needed and thanks for working on this, @znatty22! I have a few thoughts:
meta.security
where the following Value Set is bound: https://www.hl7.org/fhir/valueset-security-labels.html. We are migrating our data over to FHIR, and if we can map the above values to the existing ones in the Value Set resource, that seems ideal. If they are not completely mapped, we can extend this Value Set since the binding strength is Extensible (See https://www.hl7.org/fhir/resource.html#Meta). Maybe something like
embargo_hold
orpending_release
: visibility is set to false due to 6 month embargo before data is released on portal
I like the pending_release
value. This makes it more general that for this "thing" to be set visible, we are waiting for regulatory approval (e.g. dbgap approval)
Example 2: Visibility-reason: in-embargo visibility-comment: expected release date 12/13/23
Example 3: visibility reason: file-quality-issue visibility comment: RNA below 50M read cutoff
What do you think about adding another field that allows for a free text explanation? I can see that being helpful as I type out the above.
I think this is a great idea @baileyckelly.
Keep in mind the following. If you think you will want to run queries to search for entities that have been hidden for specific reasons like ngs-check-mate
then having a specific enum will make that a lot easier. If we have a more general enum value for these types of issues like peddy-issue
, then you'd have to search for peddy-issue
entities and then somehow, maybe manually, browse through the comments to figure out which are related to the ngs-check-mate
issue. Although, I suppose you can always use a standard comment text for ngs-check-mate too though so maybe it won't be an issue
Keep in mind the following. If you think you will want to run queries to search for entities that have been hidden for specific reasons like
ngs-check-mate
then having a specific enum will make that a lot easier. If we have a more general enum value for these types of issues likepeddy-issue
, then you'd have to search forpeddy-issue
entities and then somehow, maybe manually, browse through the comments to figure out which are related to thengs-check-mate
issue. Although, I suppose you can always use a standard comment text for ngs-check-mate too though so maybe it won't be an issue
Yea I think there's value in having a general enum for the reason and then a more specific free text column for an explanation that maybe we build common constants for, like @znatty22 suggested. Think this would help make querying easier and also be helpful for any reason that is marked as "Other".
In terms of querying, for example, this would be useful in cases where we would want to find all peddy issues -- regardless of the specific reason, or all data that is pending a release and then use the comment section to further filter if needed.
I would imagine something like:
general enum: peddy_issue
specific comment:gender mismatch
or relationship mismatch
Also to the point of releases -- @baileyckelly how do you feel about the following?
general enum: pre_release
specific comment: embargo
,pending new versions
..etc.
Motivation
The
visibility
column on each table is a boolean field which indicates whether that row should be included or not in the study release process. The problem is we often don't know why something was set to visible or hidden because we don't track this information anywhere.Approach
Hey everyone, I've collected all of the responses on the dataservice visibility PR and summarized the resulting changes that people seem to want. @allison if you or anyone has any other changes/suggestions, please let me know. I'll wait to hear back before I implement anything
Add new visibility_reason column
Restrict values to enumeration
Ready for Release
- visibility is set to true because this row should be included in the next release.Pre Release
- visibility is set to false because this row is waiting for approval to be released. The visibility_comment will have the specific reason.Sample Issue
- visibility is set to false due to an issue with the sample. The visibility_comment will have the specific reason.Consent Hold
- visibility is set to false due to some kind of consent issue. The visibility_comment will have the specific reason.Sequencing Quality Issue
- visibility is set to false due to some kind of issue with the quality of a sequencing file. The visibility_comment will have the specific reason.Other
- visibility is set to false due a reason we do not yet have an enum for. Thevisibility_comment
will have the specific reason.Unknown
- visibility is set to its current value for an unknown reasonAdd new visibility_comment column
Examples:
Peddy issue due to relationship mismatch
Sample issue due to NGSCheckmate mismatch
Data is in embargo period
Sequencing data file has low coverage