wtchg-kwiatkowski / observatory-web

0 stars 0 forks source link

Pivot table options can cause unresponsive browser #469

Closed leehart closed 5 years ago

leehart commented 5 years ago

Samples table Pivot table Column: Sample ID Row: Sample ID See that the browser does not respond for about 1 minute. When the table eventually appears, see that trying to scroll the table, or choose other options, etc. causes more unresponsiveness. Some times the browser's automatic debugger will be triggered: Paused before potential out-of-memory crash.

leehart commented 5 years ago

This is despite imposing a limit on the query rows:

    let queryAPIargs = {
      database: this.config.dataset,
      table: this.config.tablesById[table].id,
      columns,
      query: this.getDefinedQuery(query, table),
      orderBy: order,
      groupBy,
      start: 0,
      stop: 1000,
      transpose: false
    };

...which itself causes problems related to incomplete data (#319).

I'm thinking of removing this limit (to solve #319, etc.) and then trying to find a different way to prevent this memory issue, such as only allowing options that produce fewer than (7096 samples x 29 countries = 205,784 data points).

In theory, JavaScript arrays and objects should be able to remember 4,294,967,295 entries, so I suspect there are some other hogs in the forest.

leehart commented 5 years ago

I'm thinking we could handle this in the same way that we currently handle the download limit breach, in DataTableWithActions, which uses a utility function DataDownloader, which calls an onLimitBreach function, and displays a message advising the user (to use a filter, etc.). So we can be consistent with that approach/precedent. (DataDownloader currently has a constant set to MAX_DOWNLOAD_DATA_POINTS = 100000, which isn't such a bad place to start, considering I know 205,784 data points causes problems in the pivot.)

leehart commented 5 years ago

Limitations/simplicity of the current API.rowsCount means we can't (yet) get a proper rowsCount for group by queries that include a count(*), because this sort of query:

SELECT count(*) AS "count","sample_id","country_id" FROM "pf_samples" GROUP BY "sample_id","country_id" ORDER BY "sample_id" asc,"country_id" asc

simply becomes this sort of rowsCount query

SELECT count(*) AS "TotalRecordCount" FROM "pf_samples" GROUP BY "sample_id","country_id" 

which causes the API to return the first grouped value, which isn't a count of rows. Or if we tweak API.rowsCount to strip out group by, then we get the just as useless

SELECT count(*) AS "TotalRecordCount" FROM "pf_samples"

For monetdb, it looks like a common way to get a row count of group by queries would be a subquery/sub-select, but I don't think that's straightforward in Panoptes (yet). (I think it was built on a very simplistic/flat data and query model/assumption.)

On a side note, it turns out the the unresponsiveness isn't due to the query itself and the volume of data per se, but only the process of displaying/rendering that data as a mui Table. So another approach is to run the normal group by query, and then count the data points in JavaScript, then decide whether to show the results (if it's safe to do so) or show a message instead.

leehart commented 5 years ago

Verified on staging