Open kolmakova opened 7 months ago
Mosaic does not currently have dedicated support for filtering across multiple tables. While optimizations such as query caching and query consolidation (grouping queries over the same table with the same groupby values) apply generally, our data cube indexes apply only to selections with clients that all target the same base table.
When feasible, I would recommend performing any joins as a pre-processing step: create a new table (or view) that joins the data together, and then use that as your base for visualization and interaction. The table or view could be created as part of your initial data loading; we do this in the unemployment map example.
Otherwise, it would be great to learn more about your use case needs. We'd like to provide richer support for cross-table linking, but we're still in a "requirements gathering" phase of design. (And I believe @domoritz and collaborators may also be looking into similar issues?)
As for this part: "It generates a correct query and works on small tables (~30,000 rows), but on tables with 100,000+ rows, it does not work at all, no reaction on selection and no errors in the console."
, I don't know what is going on. Indeed it seems strange that the scale difference alone would fundamentally change things beyond performance (unless the joined results are so large they overrun browser memory limits). I'd start by looking at DuckDB logs to see if the queries are executing successfully, and of course you can also log your custom client's queryResult
and queryError
methods to see if a response is arriving.
CC @willeppy
@jheer Thank you for the answer!
I'm working on a graph visualization. It has one raw data table and two additional tables: one for unique nodes and one for links between them. The graph relies on these generated nodes and links tables. The nodes and links have aggregated values calculated from the raw data, visualized in histograms for each value. If the aggregated value is by link, the histogram will select both nodes involved in the link within the selected value range. So selecting nodes in the graph by histogram ranges, crossfiltered by nodes and links, is a clear use case in the graph visualization context.
Crossfiltering between only nodes or only link values works super fast, I'll try to manage crossfiltering between nodes and links in the code, not SQL.
P.S. I've debugged my join queries on large tables and found that they execute successfully, but run very slowly - up to 2-10 minutes, even though the result set may only contain a few rows.
Hi! I am building a custom
MosaicClient
for my own visualization components. I have twoSelection
instances using crossfilter - one for table1 and another for table2. EachSelection
can have few histograms with own conditions. I'm wondering if it's possible to create a filter that queries across bothSelections
on the different tables?The case is that table2 has ids and corresponding values, while table1 has links between the ids in table2 and values for each link - so table1.source and table1.target columns are the same type as table2.id.
I need to do something like this query below but
Query
has nojoin
method and I have no idea how to tellQuery
how it should compare table2.id with table1.source and table1.source using existing methods.I've also tried to construct my own query class that extends
Query
with a.join()
method. However, I could not construct this query completely using onlyQuery
methods, there are still parts that require string construction:My extended
Query
class is:And its usage in
MosaicClient
:It generates a correct query and works on small tables (~30,000 rows), but on tables with 100,000+ rows, it does not work at all, no reaction on selection and no errors in the console. This is kind of strange, I suspect there may be an indexing issue caused by my string constructions because they can't be optimized by Mosaic. And I don't use
.or()
and.and()
ofQuery
here because they add quotes like "table2.id" for exact column name but this produces a broken query as there is no exact "table2.id" column in the context of comparing between few tables.How can I solve this problem?