Provide mechanisms for custom filtering of data layers

abought commented 4 years ago

Requested by @massung , and targeted for v0.13 (two releases hence) based on current work queue. (EDIT: moved to 0.12)

There are several use cases for which people would like to filter the data elements that are displayed on the plot.

We would like:

New UI to expose our existing filter syntax via toolbar buttons
Implementation to accept that UI across several data layers. ("arcs with score above significance threshold", "significant assoc variants", "specific intervals)
A low-level method BaseDataLayer.setFilter(afunction) , to manually declare an arbitrary filter function that does not fit within the configuration mechanism.

There should be an order of precedence for each of these options. @Parul-Kudtarkar and @psnehal have also suggested such a feature recently and may have interesting suggestions.

kennethbruskiewicz commented 4 years ago

Importantly the filter should take effect on UI refresh, or otherwise setting the filter refreshes the UI using filtered data.

massung commented 4 years ago

I also wanted to note that I think it's 100% reasonable for LZ to assume the filter [function] is dealing with 100% immutable data. For example, I think this would be an unreasonable filter function (using Vue as an example, but would be true of any reactive framework):

let pThreshold = this.$store.state.pValueThreshold;

lz.setFilter(item => item.p_value < pThreashold.get);

I think the above could be supported if a watcher (on $store.state.pValueThreshold) called lz.refresh() or similar, which forced a redraw. But, there would be no way for LZ to just know the value changed on its own. I would expect that when the page state changes, we could (and maybe should) be expected to provide LZ with a new filter function.

abought commented 4 years ago

Work is underway in an open PR that describes the syntax: https://github.com/statgen/locuszoom/pull/198

One quirk is that some behavior may not be well defined for certain data layer types; if there is not a clear use case for eg genes track, then I'm not sure it's worth the investment of time and energy internally to de-tangle the edge cases. It makes most sense, and helps deliver faster, if we can narrow down specific visualizations/use cases for this feature.

Based on the provided example, I assume that @massung is looking to filter, eg arc, annotation, and scatter plot? Two syntaxes (declarative and imperative) are provided in the proposed PR; tomorrow I'll work on UI.

Agreed with Jeff: given how we implement separation of concerns, the plot should not assume that data is changed outside of getData. The UI widget will work via layout mutation (modify data_layer.layout.filters and re-draw).

massung commented 4 years ago

Just because this came up in a meeting just now (at Broad), it would be great if:

we can provide more properties than what's used by a source; so that...
we can filter on the whole object and not just lz-defined properties

As an example, with an associations plot, we have a case where we'd like to filter and show only coding variants. We have that information from the query, but it is lost to LZ completely when we transform it for the layout. If that field was kept, and we could filter on it, that'd be fantastic!

abought commented 4 years ago

There are two basic ways that you can control what data is sent to a panel, but it's absolutely possible:

The intuitive one: Modify the default fields array in the data_layer layout object. The fields array is about the closest thing we have to a "contract" about what data the layer expects to see. The desire to add "extra" fields is a surprisingly common use case actually, to the point that it would be hard to create a more rigid formal contract between layer and source. (adding more fields is the norm, not the exception)
The conversion between API payload keys and the namespaced, filtered data object happens in BaseAdapter.extractFields (old LZ 11 called this class Data.Source). You could certainly implement your own method that passes more fields than just what the layer requests, but extractFields is a weird function and a bit fiddly. In LZ 12, sources will be formal ES6 classes, so super calls will be possible (an easier and more obvious way to extend existing functionality).

Once the field is present, filtering by coding variants would certainly be possible. The new text widget is good for numbers and strings; for a boolean, I think the "display options" dropdown would be a good UI option. (and it's getting some improvements for filtering control in the next release too)

abought commented 4 years ago

@massung , how immutable are your intended filters in practice? (eg, what are some use cases for how filtering behavior will change as a user interacts within one sesson?)

I'm considering a change to the "imperative" interface where filter functions would be selected (by name) from the registry- similar to scaling functions.

Intended benefit: This would create a place for users to add custom programmatic elements that controlled display, without relying on manipulating internals. The imperative interface is powerful but also has a very different API "feel" than other features, making it more confusing to learn.

Expected drawback: This would be more restrictive than pure programmatic control, but with more emphasis on reusable building blocks. Placing custom code in the registry works best for well-defined tasks like scaling functions: "choose the color of a scatter plot dot based on predefined LD cutoff values in the range 0..1". It would work less well for tasks like "user types into search box, and a table and plot must both update at once, based on a value that is not known in advance and is not known within LocusZoom".

Parul-Kudtarkar commented 4 years ago

@abought thanks for implementing the filtering widget! It would be nice if the filtering widget is implemented in layout instead of HTML. Implementation in layout should compress the code just in-case there are multiple datasets?

massung commented 4 years ago

@abought I'd very much prefer just a simple filter function instead of a built-in interface. I can easily envision LZ having some simple filter interface elements (like the dashboard) that are just high-level and use the filter functions under the hood. It would then be trivial for us to not use those interface elements and just supply our own functions at-will.

abought commented 4 years ago

@abought I'd very much prefer just a simple filter function instead of a built-in interface. I can easily envision LZ having some simple filter interface elements (like the dashboard) that are just high-level and use the filter functions under the hood. It would then be trivial for us to not use those interface elements and just supply our own functions at-will.

Jeff,

I understand the desire to support many approaches. Ultimately, it's important for us to phrase this in terms of sample actions/user needs: we will build a tool that enables functionality. So far, I think the discussion in this thread has been weighted towards "this should be a function with x properties" rather than "when a user does _, the resulting UI should __". Let's aim to discuss further on next week's call, perhaps?

A more consistent and predictable API surface would be nice; the downside to modifying through code is having to deal with mutating nested objects. (I've done it; it's not as fun as it sounds) We'll explore several options, and the final choice will be driven by what the user needs to do, rather than committing up front to a specific architecture or arrangement of code.

That said, our staging server does have an example of the new UI in action to foster discussion. (type a logpvalue to only render points above a cutoff; useful for dense datasets with too many variants to render) https://staging.locuszoom.org/gwas/295543/region/?chrom=3&start=45634967&end=46034967

abought commented 4 years ago

@abought thanks for implementing the filtering widget! It would be nice if the filtering widget is implemented in layout instead of HTML. Implementation in layout should compress the code just in-case there are multiple datasets?

Hi, Parul. I'm not quite sure what you mean by "compress the code"- could you clarify?

LocusZoom 0.12 will include a declarative layout-driven filtering widget, for people using our pre-defined filters. You can try it out on the associations panel for this sample dataset: https://staging.locuszoom.org/gwas/295543/region/?chrom=3&start=45634967&end=46034967

The sample layout code to create the widget is here: https://github.com/statgen/localzoom/blob/develop/src/util/lz-helpers.js#L135-L144

And here are the builtin filters so far. I'm considering adding two more, for some features we are currently exploring: https://github.com/statgen/locuszoom/blob/develop/esm/components/data_layer/base.js#L653-L662

in (like gene_biotype_field_value in [predefined_choices]`)
text match (like user_entered_phenotype_string in pheno_field_value)

Parul-Kudtarkar commented 4 years ago

@abought, Thanks I will try the filters! I would imagine every time I add panel (where I have multiple studies) I would have to add filtering widget (unless it's predefined for that data layer in panel layout). Let me implement and get back to you.

I like the latest LZ panel for association studies - the entire genome level view (zoomed out) Manhattan plot; once zoomed in to the region of interest it helps with region specific investigative analysis. It is a useful feature!!

abought commented 4 years ago

@abought, Thanks I will try the filters! I would imagine every time I add panel (where I have multiple studies) I would have to add filtering widget (unless it's predefined for that data layer in panel layout). Let me implement and get back to you.

If you have a standard layout you like, my recommendation would be to add it to the registry (Layouts.add) as a new generic option for your site. Rather than modifying a default layout every time, you'd create one modified version and use it whenever you create each new panel in that style.

We're very much designed around the concept of user plugins, so that's a valid and reasonable thing to do. We even do this internally when creating "standard with slight tweaks" layouts. See, eg: https://github.com/statgen/locuszoom/blob/70d0d00f0aa8a0af4d3f68a1fcb477da1d779f4e/esm/layouts/index.js#L258-L266

Happy to help further with that by email, but a longer discussion of panel layouts might be off topic in the filters thread.

Parul-Kudtarkar commented 4 years ago

Thanks! I completely forgot that I could use generic option -

toolbar: ( function () { const base = deepCopy(standard_panel_toolbar); base.widgets.push({ type: 'filter_field', position: 'right', layer_name: 'coaccessibility', field: '{{namespace[access]}}score', field_display_html: 'Score', operator: '>=', data_type: 'number', }); return base; })(),

The panel is pretty much standard for co-accessibility datasets! Thanks @abought I'll get in touch via. email

abought commented 4 years ago

Notes from today's call on concrete requirements:

Many widgets on page need to stay in sync
Filter options are not set during initial render
Filter options can change after layout based on user interaction

Overall these items should be covered by layer.setFilter (accepts a function handle with same syntax of array.filter), followed by a rerender once all filter changes have been applied (plot.applyState). The one annoyance is that LZ uses namespacing internally, but external widgets might not; the low level function will need to take into account that the field may be called something different across usages.

(eg deNamespace can be used. Namespaced field strings are a definite API wart, but emphatically not in scope of this ticket)

abought commented 4 years ago

Based on today's call this one looks good to go. I'm closing, but let me know if additional items are required. I believe Kenneth will take point on following up with, Eg, forest plot filter use cases.

statgen / locuszoom

Provide mechanisms for custom filtering of data layers #196