neurobagel / digest

Web dashboard for neuroimaging-derived and phenotypic datasets
https://digest.neurobagel.org
MIT License
2 stars 2 forks source link

[ENH] Enable better filtering for missing values #112

Closed michellewang closed 5 months ago

michellewang commented 7 months ago

Is there an existing issue for this?

New feature

The default Plotly Dash filtering options does not allow the user to select for missing values, which means they cannot check which subject-session record have missing values for a column of interest.

This is mainly relevant for the phenotypic bagel, since the schema does not specify missing values for columns unlike the imaging bagel.

@alyssadai proposed (on Slack) to fill missing values with some hardcoded value ("", "n/a"). This would probably work except there might be confusion if the raw data itself already has a dedicated NA value. It is also unclear to me what the behaviour would be in non-text columns: for example, adding empty strings to a numerical column might make it a text column instead of numerical (?)

Another potential solution would be to not allow any missing values in the input file (i.e., force the user to do their own filling in of missing values). So the users would need to choose how to denote their own text/numerical/datetime missing values

alyssadai commented 6 months ago

Blocked because of Alyssa's time spent on CLI/modeling issues for now

alyssadai commented 5 months ago

Hey @michellewang, thanks for your patience on this feature!

So, after much digging and trial & error I discovered that there is actually a special, non-documented (even chatgpt didn't have the right answer 🙃) syntax for filtering missing values, which is is blank or is nil.

Given this, I think the simplest solution for now would be to add a text element (maybe a collapsible element?) near the datatable with explicit instructions on filtering for missing values + a link to the rest of the filtering syntax. This also avoids having to modify the underlying data in any way.

Notably, it appears that there's currently no syntax for the opposite, filtering for non-missing values. Assuming this would also be helpful, I can try to implement as part of this issue custom filtering logic for the query is not blank to accomplish this - I suspect this might be most intuitive given the above.

Let me know what you think!

For internal documentation's sake, some issues/posts that led to the answer:

alyssadai commented 5 months ago

A quick update on this. I've realized that supporting custom filtering syntax for non-missing values (e.g., is not blank) would most likely require switching entirely from the native filtering to custom backend filtering for all the current operators, since having a mix of the built-in filtering behaviour and a custom column filtering behaviour is not natively supported and would be quite complex to coordinate.

Given the growing limitations of Dash DataTable for our purposes, I opened an issue for migrating to Dash AG Grid which provides much more advanced filtering functionality natively (incl. filtering for blank and non-blank values) as well as other advantages that likely make it a better long term solution for digest.

TLDR

michellewang commented 5 months ago

Thanks for looking into this! Great to hear about is blank -- it's surprising that this is not documented so maybe it wouldn't be good to rely on it in any case.

I briefly checked out Dash AG Grid and it looks like switching to it would be the way to go. Hopefully it's not too much work to do!