informatics-isi-edu / ermrestjs

ERMrest client library in JavaScript
Apache License 2.0
4 stars 3 forks source link

Optimize attributegroup read request by eliminating array_d/array and listing columns #932

Open RFSH opened 2 years ago

RFSH commented 2 years ago

When an all-outbound path's value is needed for a page (either by being visible, or being used in the wait_for list of another column) we're modifying the main entity request into an attributegroup request which includes the data for this path as well. For this attributegroup request, we need to provide a projection list of rows/data that we want to get from ermrest. For all-outbounds we ask for the whole row of data by doing array_d(F:*).

As part of #931, we optimized this logic by making sure we're only asking for scalar column values when an all-outbound scalar value is used and doesn't require the whole row of data. Doing so makes the request more performant by making sure we're not asking for data that we don't need.

Now we should see if we can optimize other parts of the url that are using array_d/array and asking for the whole row, which means:

One suggestion is using the "columns" that are defined in the source-definitions to figure out the list of columns that we should get the data for. If it's not true, then we can assume that the listed columns is all the data that data-modelers need.

This logic should work properly for any all-outbound projection that we're adding, but might not be enough for the main entity projection. Currently to figure out the list of all-outbounds that we need to fetch we're not just looking at the fkeys. We're fetching data for all the all-outbounds that are visible or part of a wait_for of a column.

Also columns and fkeys are supposed to be used for backwards compatibility and only related to templating environment. Expanding their usage would cause some unexpected behavior as this will affect the column-display or any other display related annotation as well (given that the whole row of data might not be available anymore).

This requires more thought, I should list all the changes required as well as wether this is even feasible. I should list where we expect the whole row and whether we can relax this in some cases or not.

karlcz commented 2 years ago

There are two feasible ways to replace M:=array_d(M:*) here:

  1. Just use M:*

  2. List explicit A1:=M:col1,A2:=M:col2,...

The first would give you all the columns of M, but you'd have to adapt to the slightly odd way ermrest names them. You don't get to define your own per-column output aliases this way. The output alias naming scheme is <table alias>:<col name>, so you get literal outputs like M:RID, M:name, etc., with a colon in the field name. The benefit is that you get all the fields without making the URL longer.

The second form allows/requires you to name each output if you don't want the default bare column names. The drawback is the potentially long projection list in the URL. The benefit is that queries might be more efficient if the model config allows you to request fewer columns to populate the template environment needed for Chaise.