API support for storing "record preview" and "record selector" settings

seancolsen commented 2 years ago

Description

To support the Record Selector and Record Preview features (for foreign key constraints), we need a way to store some settings that allow the user to customize the behavior of those features.

Motivation

The Foreign Key Constraints Spec talks about "Table Link Preferences" in the video for Scenario 5, but only briefly.

Terminology

I'm switching terminology from "preferences" (used in the spec), to "settings" (used here) because "preferences" connotes user-specific values which I don't think we want in this case. If anyone has concerns with this adjustment, we can discuss and switch back to "preferences" language if needed.

Specs

Users read/write the settings via the API. Please note that the API paths might change based on whether we've implemented multiple API namespaces (see #1029).

`/api/v0/tables/<id>/settings/`

(This is a new endpoint)

GET response:

{
    // this setting tracks what columns from THIS table as shown as previews in OTHER tables
    "preview_columns": {
        "columns": [278, 279, 280], // references columns by their id
        "customized": false
    }
}

PATCH request
- Same schema as GET response
- The columns property must be specified in full.
- All types of columns are permitted in columns.
- Record Selector performance will be best when the columns have indexes, but we are not taking any steps to enforce that at this point.
- The front end can reset user-customizations by sending the following request. Then columns will be re-computed.
```
{
    "preview_columns": {
        "customized": false
    }
}
```
- Error if
  - columns contains any values which are not ids of columns in the table.
  - columns length is not within 1 - 5.
Defaults

To create a default list for preview_columns.columns, we use the following logic to make a best-guess of what the user would want.
- Have 5 columns if we can.
- First, start with all columns with UNIQUE constraints that don't also have FOREIGN KEY constraints.
- If there are fewer than 5 columns with UNIQUE constraints, fill the remaining columns with the first columns encountered that are not BOOLEAN or have FK constraints.
- If there aren't 5 columns still, then start including BOOLEAN columns and FK-constrainted columns
When columns are modified
- We should recompute columns whenever columns are added, edited, or deleted as long as customized is not set to true. This should be part of the implementation.
- When a column is deleted, we remove any references to it within preview_columns.
- If the user has customized the column set and there are no columns left in preview_columns after deleting a column, then we set "customized": false and re-compute the default set of columns.

`/api/v0/tables/<id>/columns/`

This endpoint already has a display_options property for each column. We need to modify the display_options property, adding a show_fk_preview property as follows:
```
{
    // other column API fields go here
    "display_options": {
        // other display options go here
        "show_fk_preview": true
    }
}
```
The show_fk_preview property will default to true if the user has not customized it. Later, once we have users, we may consider making this default into a user-specific setting.

Questions originally posed in this ticket (now answered)

All these questions have now been answered via comments, and those answers are reflected in the ticket summary above.

Does anyone have critique of the functionality specs that I wrote above? The official specs didn't have a whole lot of detail here so I filled in the gaps based on my own opinions.

Does anyone have critique of my proposed API design?

Defaults

What is the default state of use_preview_in_references?

What logic do we use to compute the best-guess for preview_columns?

How do we avoid generating a worthless default for preview_columns?

Here are two scenarios, that when taken together, create some interesting challenges.

Scenario (1): User has intentionally configured preview_columns to use a single column. We'd like to preserve that setting no matter what other columns the user adds or deletes.

Scenario (2)... User adds a blank table. It only contains an id column. We set preview_columns to use the id column automatically because we have no other choice. User adds a name column. What do we do here? Perhaps we assume that, since preview_columns contains only the PK column, the user has not customized their columns so we go ahead and set preview_columns using our best-guess logic, which incorporates name into the column list. Notwithstanding the fact that our assumption may be inaccurate, now we have a problem where preview_columns looks just like it does in scenario 1, leaving us no way to know the optimal point to compute our best guess. This situation renders our best guess logic a bit useless, leaving us only with a default of the first column created in the table. If we consider that users may want to intentionally set preview_columns equal to the PK, then we're left with a record selector that always requires configuration before it can offer any value to users who created their tables within Mathesar. Of course the situation is better for imported tables, but only initially -- when the user adds/deletes columns, the quality of the default has the potential to degrade.

Though complex, I wonder if it's worth adding a separate setting which tracks whether or not the user has customized the preview_columns.

I might even consider having best_guess_preview_columns (which the API always returns, but will fail to modify) and custom_preview_columns (set to [] by default). But I'm not so sure. Others may have better ideas here.

Are there certain types of columns that should be disallowed from preview_columns?

I would suggest that for now we only allow text and numeric columns.

Do we want to force all preview columns to be indexed since they'll be searched within an auto-complete? I think we do. If an API consumer tries to add a non-indexed column to preview_columns, should the API throw an error? Or add an index?

Should use_preview_in_references be stored per-table or per-column?

In this discussion thread, we agreed that table link preferences should be stored "per-table" (not "per-column"). I imagine that during that discussion we were all primarily considering storage of preview_columns and not giving much thought to storage of use_preview_in_references. As spec'ed above, the use_preview_in_references setting would be stored per-table too, in accordance with the consensus that emerged from our discussion. That approach is certainly the simplest to implement, but it's worth considering that users might want to show the record preview in table X and hide it in table Y, when X and Y both table Z, to which the setting is associated. As demonstrated in the prototype, the record preview can significantly reduce the number of records displayed on one screen. The specs also require a toggle for the record preview within the column header, and I can see how I'd want to have the ability to quickly toggle that setting to drill down and then zoom out. However, when placed in the column header, the setting would suggest that it's specific to the column, which I think is problematic.

All things considered, I'd suggest that we proceed with the simplistic approach of storing the use_preview_in_references at the table level. We could consider allowing per-column storage of "show record preview" if the need becomes strong enough.

If we want per-column control over the "Show record preview" setting now, then I'd suggest we associate that setting with single-column FK constraints. We could potentially use the constraints API to read/write the setting.

seancolsen commented 2 years ago

@kgodey @ghislaineguerin @mathemancer @silentninja @dmos62 @pavish I've assigned this ticket to you to review the questions above and comment with any thoughts. Please comment and/or unassign yourself.

mathemancer commented 2 years ago

As far as the use_preview_in_references question goes, I'll abstain; seems like a UX question. Regarding the choice of preview columns:

You can't identify a row unless the presented tuple is unique.
This has a bonus effect: We'd end up with the potential to define a unique index on the preview column tuple, greatly increasing performance.

The downside is that if we go that route, it would take a bit of time to change the setting. On the other hand, the upside is that it restricts the search space for possible preview column sets, which would make suggesting simpler. As for suggestions, you point out a real problem. I'm quite partial to maintaining a separate list of user-chosen and suggested preview columns. The way would be: The user chooses some (maybe zero) preview columns, and we fill in the rest with suggestions. They could turn off the suggested preview altogether (since it may be more clutter than they want), using only their chosen columns. If they do choose columns, we'd fill in the rest, attempting to make unique tuples.

pavish commented 2 years ago

@mathemancer I assume this issue was accidentally closed. I'm reopening this.

ghislaineguerin commented 2 years ago

@seancolsen I think you make a good point in keeping preferences at the table level only. I'm wondering if record preview could be treated only as a display mode for the column, meaning it will work the same if the FK points to a different table.

kgodey commented 2 years ago

Here's the schema I would suggest:

{
    // this setting tracks whether to show link previews while viewing THIS table. this may not be stored here, see the end of this comment.
    "show_link_previews": true,
    // this setting tracks what columns from THIS table as shown as previews in OTHER tables
    "preview_columns": {
        "columns": [278, 279, 280],
        "customized": false
    }
}

This namespaces the settings, allowing us to add different types of settings in the future. It also shortens setting names.

What is the default state of use_preview_in_references?

This is now show_link_previews. I think this should be true to make the product friendlier for non-technical users by default. Later, once we have users, what the default is could be a user-specific setting.

What logic do we use to compute the best-guess for preview_columns?

This is now columns. I think we should use the following logic:

Have 5 columns if we can.
First, start with all columns with UNIQUE constraints that don't also have FOREIGN KEY constraints.
If there are less than 5 columns with UNIQUE constraints, fill the remaining columns with the first columns encountered that are not BOOLEAN or have FK constraints.
If there aren't 5 columns still, then start including BOOLEAN columns and FK-constrainted columns

How do we avoid generating a worthless default for preview_columns

We should recompute columns whenever columns are added, edited, or removed as long as customized is not set to true. This should be part of the implementation.

Are there certain types of columns that should be disallowed from preview_columns?

No, we'd be making too many assumptions to exclude any columns, I think. Dates, emails, URLs, etc. can all be unique.

Also, the columns are not only used to search, they're also used to see a preview of the related record. For example, see:

ID	Actor	Movie ID
24	Brad Pitt	2
34	Henry Cavill	3

ID	Movie Title	Watched
2	Meet Joe Black	TRUE
3	Enola Holmes	FALSE

In this situation, I might want to put "Watched" in the preview columns. Even though it won't help me search, it's nice to see whether I've watched a movie or not from the preview instead of having to navigate to the other table.

Do we want to force all preview columns to be indexed since they'll be searched within an auto-complete? I think we do. If an API consumer tries to add a non-indexed column to preview_columns, should the API throw an error? Or add an index?

No, it requires too much DB knowledge to have indexes set up. We can open a separate issue to track performance issues with auto-complete if they arise. I don't think we want to automatically add indexes as part of the issue, because it'll impact write performance and we don't want to make assumptions that that's okay – the user could be using the databases in non-Mathesar applications where that will be a problem.

Should use_preview_in_references be stored per-table or per-column?

I was originally envisioning it as per-column. I thought it would be an addition to the Columns API (show_link_previews can be part of the column display options) and we would automatically set it to null if there wasn't a single-column FK set on the column.

I think this will offer the most flexibility to the user (they can expand some FKs and hide others). I'm not sure why having the setting associated with the column is problematic.

@seancolsen I'll leave it to you to update the body of this ticket since you didn't specifically assign it to me. Let me know if I can help, though.

seancolsen commented 2 years ago

Auto-creating indexes

I said:

Do we want to force all preview columns to be indexed? I think we do.

@mathemancer said:

We'd end up with the potential to define a unique index on the preview column tuple

But @kgodey said:

I don't think we want to automatically add indexes

@kgodey has convinced me that we should not automatically create indexes. And I think we can focus on this topic later when we design Mathesar's indexing features.

Choosing default preview columns

@mathemancer said:

The user chooses some (maybe zero) preview columns, and we fill in the rest with suggestions

I'm having a bit of an intuitive knee-jerk reaction against this idea, but I'm not yet able to fully rationalize it. If others are excited to move in this direction I'll work harder on articulating my concerns. For now I'll sum it up by saying that I'm not sold on the value that this additional complexity adds.

Where to store the preview toggle

@ghislaineguerin said:

I'm wondering if record preview could be treated only as a display mode for the column

@kgodey said:

I was originally envisioning it as per-column

That seems like enough of a consensus thus far, and I agree that per-column would make the most sense from a UX perspective.

However, @kgodey your proposed schema doesn't seem to store it per-column.

Say I have the following schema:

- movie
  - id
  - title

- actor
  - id
  - name

- role
  - id
  - movie_id (FK to movie.id)
  - actor_id (FK to actor.id)
  - character_name

"Per-column" would mean that in the role table, I can have the movie preview enabled while having the actor preview disabled. However with your schema, the visibility of all the previews displayed within one table would be bound together.

Then @kgodey said:

I thought it would be an addition to the Columns API (show_link_previews can be part of the column display options) and we would automatically set it to null if there wasn't a single-column FK set on the column.

That sounds better. Are you saying show_link_previews would be accessible in the tables settings API and the columns API?

What do you think about putting it in the constraints API instead? Each FK constraint would have an associated show_link_previews boolean value. No need for null anywhere. Multi-column FK constraints would always return show_link_previews: false, and (for the time being) attempting to set them to true would result in an error.

Other misc reactions

@kgodey I agree with everything you said regarding:

Schema for preview_columns.columns and preview_columns.customized
Showing previews by default
Algorithm to choose the default set of preview columns
Logic for deciding when to recompute the default preview columns
All columns being allowed as preview columns

@kgodey I'll update the ticket description once more people have weighed in and we've reached a rough consensus about all the details here.

kgodey commented 2 years ago

@seancolsen Sorry for the confusion, I was responding to each of your questions in turn and not all together. I think the actual schema should be:

`/api/v0/tables/<id>/settings/`

{
    // this setting tracks what columns from THIS table as shown as previews in OTHER tables
    "preview_columns": {
        "columns": [278, 279, 280],
        "customized": false
    }
}

`/api/v0/tables/<id>/columns/`

{
    // other column API fields go here
    "display_options": {
        // other display options go here
        "show_fk_preview": true
    }
}

kgodey commented 2 years ago

I think it would be better to have it in the columns API since it's a display setting of a column and we already have a display_options provision for columns. That way, every formatting choice related to columns is in one place.

pavish commented 2 years ago

I'm onboard with @kgodey's suggestions.

Regarding this comment from @mathemancer: The user chooses some (maybe zero) preview columns, and we fill in the rest with suggestions.

I don't think we should override user selections. The user might actually want to only preview a single column, for eg., only select the email from the referenced table and nothing else. If we auto fill the rest, we only add more clutter. On the other hand, we should restrict the user to atleast select one column (never zero).

Edit: I see that this is mentioned on the issue description and is the reasoning behind the customized setting and I am onboard with it.

kgodey commented 2 years ago

Note: I revised my schema in https://github.com/centerofci/mathesar/issues/944#issuecomment-1011338408 to change the setting name to show_fk_preview to be consistent with the query parameter name in #946.

seancolsen commented 2 years ago

@pavish I unassigned you since it seems like you already weighed in

mathemancer commented 2 years ago

Some clarifications / responses to things that weren't clear in my original comment:

@pavish said

I don't think we should override user selections.

I'm not suggesting we do. The user selections would be kept separate, and not altered in any way. We would fill in the remaining preview columns with suggestions. The purpose for this is to do our best to help the user come up with unique tuples so they can actually identify rows in the preview. For example, they may think of choosing the subject from the classes table, and we'd fill in the teacher column as well to make rows identifiable (assuming there's more than one teacher teaching a given subject at the school). If they don't want to see the suggested columns, they would toggle suggestions off.

@seancolsen said:

I'm having a bit of an intuitive knee-jerk reaction against this idea, but I'm not yet able to fully rationalize it. If others are excited to move in this direction I'll work harder on articulating my concerns. For now I'll sum it up by saying that I'm not sold on the value that this additional complexity adds.

The value add of this is that it avoids trying to teach users that they need unique tuples to identify rows. For users that already know this, it avoids them needing to figure out or remember which columns fulfill that requirement. I strongly suggest we keep the user-chosen and mathesar-suggested columns separate (solving your problem in the latter bit of scenario (2)), since as you noted, we can't tell what we've suggested and what we haven't otherwise. Then, we can simply toggle whether or not to show the suggested columns. The suggested columns only need to be recalculated if the chosen preview columns change (or the uniqueness constraint situation changes).

@kgodey We can't guarantee unique tuples efficiently without a unique constraint, and even looking for candidates would be pretty bad performance-wise . If we have a unique constraint, we automatically have an index; you can't have the one without the other. My point is that:

since uniqueness of the tuple is a requirement for its usefulness in identifying a row, and
we need to constrain to guarantee that the tuples formed by a set of columns are unique,
we will get an index that helps us search (under many conditions) automatically, whether we want it or not.

You're correct that this will slow down writes; at the scale where that will be a major problem, many other parts of Mathesar will become unusable performance-wise, including trying to look up rows without indices on columns. My original comment was pretty unclear about this point. The main downside is that we would need to guide the user to use uniqueness constraints where they can, but I'd suggest that for data integrity reasons anyway.

dmos62 commented 2 years ago

I'm sorry to join the discussion at such a late stage.

I've a concern about these settings being global and possibly user-specific at the same time, which is a conflict. What I mean by global is that if one user sets one of these settings to X, any other user's respective setting will then be X too. And, if one of the users prefers a different setting than another user, that will result in a conflict.

I'm pretty much pointing out that this design is for a single simultaneous user environment. My knee jerk reaction was to have these settings client side. Might be too late to discuss that, but I just wanted to put this concern out there.

kgodey commented 2 years ago

@dmos62 I think it's fine to set them globally for now, when we introduce users, we can allow users to override global settings.

@mathemancer It seems like you're suggesting having a separate list of suggested preview columns that we can suggest to the user. I think this is a good idea, we should expand the API schema to:

`/api/v0/tables/<id>/settings/`

{
    // this setting tracks what columns from THIS table as shown as previews in OTHER tables
    "preview_columns": {
        "columns": [278, 279, 280],
        "suggested_columns": [281],
        "customized": false
    }
}

suggested_columns should be any column with a unique or primary key constraint that's not already in columns. It doesn't have to be limited to 5 columns.

I still don't think we should limit the user to selecting columns to those with a uniqueness constraint.

mathemancer commented 2 years ago

I'm not too sold on limiting the user either, though we might consider a hepful infobox extolling the benefits of uniqueness constraints if they've chosen poorly. I do think we should choose the suggested columns to help fill out a unique set, in combination with the user's selected columns when possible. For example, suppose the tuples formed by values in the columns

(teacher, subject, time_block)

are unique. It wouldn't take too much sophistication to help the user round out a selection involving some but not all of those columns (maybe they've chosen teacher and subject, but not time_block). We'd start by finding any unique constraints, and then suggest columns from unique constraints where the user has already chosen some of the involved columns. This would help keep away from suggesting columns that aren't very useful for human-friendly identification (e.g., the id column, a student_number column, etc.). We could then fall back to suggesting columns from other unique constraints if space allows, or we can't make it work with the user-selected columns. In almost every table, it should be possible to round out a user selection to unique rows (though the constraint itself might not be defined).

Side note: @ghislaineguerin @kgodey I looked back through the spec, and couldn't figure out whether we intend to support previews for many-to-many or one-to-many (i.e., previewing the backwards direction along a foreign key reference). Do we need that? I think (but it's just a first-pass gut check) that the API should work, and the column-suggesting ideas we have still make sense, but I'm not sure how that would work with the current design.

ghislaineguerin commented 2 years ago

@mathemancer as far as I know, in the context of tables we will only have unique FK values per field. If many-to-many then the record preview will show the records belonging to the mapping table, in which case the design doesn't change. Or am I missing something? I believe we'll be able to show summaries only in views.

silentninja commented 2 years ago

If many-to-many then the record preview will show the records belonging to the mapping table, in which case the design doesn't change

@ghislaineguerin The user would actually be trying to search the column the mapping table refers to, so should we have provision to search through the referred column of mapping table or point them to visit the mapped table instead

silentninja commented 2 years ago

We'd start by finding any unique constraints, and then suggest columns from unique constraints where the user has already chosen some of the involved columns

@mathemancer This is a bit confusing. You mean to suggest we find a unique row pair even though the column itself is not unique but when combined could be unique, is that right?

kgodey commented 2 years ago

we might consider a hepful infobox extolling the benefits of uniqueness constraints if they've chosen poorly.

@mathemancer I recommend opening a new design issue to consider this after the alpha release so we don't forget about it. It is not part of the current spec.

I do think we should choose the suggested columns to help fill out a unique set, in combination with the user's selected columns when possible.

I like this addition to the algorithm for choosing suggested columns, I will leave it to @seancolsen to make the necessary updates to the issue.

Side note: @ghislaineguerin @kgodey I looked back through the spec, and couldn't figure out whether we intend to support previews for many-to-many or one-to-many (i.e., previewing the backwards direction along a foreign key reference). Do we need that?

@ghislaineguerin The user would actually be trying to search the column the mapping table refers to, so should we have provision to search through the referred column of mapping table or point them to visit the mapped table instead

We will not support this per the current spec, the user will have to add FKs in the mapping table directly. We can consider adding functionality for searching "through" mapping tables after the alpha release.

mathemancer commented 2 years ago

@mathemancer This is a bit confusing. You mean to suggest we find a unique row pair even though the column itself is not unique but when combined could be unique, is that right?

Yes, exactly.

seancolsen commented 2 years ago

I opened #1039 to splinter off the feature of suggesting additional columns to users during the column customization process.

seancolsen commented 2 years ago

I've updated the description to reflect resolution of all the above discussion and marked this as ready for implementation now. We also had a brief discussion in Matrix about this today, making sure everyone is copacetic.

kgodey commented 2 years ago

@seancolsen I made a small update about the API paths, right under the "Specs" heading.

silentninja commented 2 years ago

@dmos62 Can you remove your assignment if you finished sharing your feedback on this issue?

silentninja commented 2 years ago

I am having issues with the spec'ed out API schema.

Is there a reason why we need a separate route for /table/{id}/settings/ since there is only one setting object for a table? Shouldn't settings be part of the table object?
How would the frontend get the settings id to send a PATCH request to update settings.

But having a separate route will make sense if we have user settings which would be implemented in future, so I am confused on which API schema to choose

silentninja commented 2 years ago

I had a call with @seancolsen, the options we came up with were

Option 1(Spec'ed schema)

Not a valid Restful API as the GET request was using an object instead of an array, option 3 listed below is an alternate API that is restful.

Option 2 (part of tables API)

GET /api/v0/tables/<id>/

{
  // ...
  "preview_columns": {
    "columns": [278, 279, 280],
    "customized": false
  }
}

Option 3 (separate entity)

GET /api/v0/tables/<id>/settings/

[
  {
    "id": 1287,
    "preview_columns": {
      "columns": [278, 279, 280],
      "customized": false
    }
  }
]

GET /api/v0/tables/<id>/settings/<id>

{
  "id": 1287,
  "preview_columns": {
    "columns": [278, 279, 280],
    "customized": false
  }
}

PATCH /api/v0/tables/<id>/settings/<id>

{
  "preview_columns": {
    "columns": [278, 279, 280],
    "customized": false
  }
}

We decided to go with Option 3 as it was easier to implement on the backend as well as giving us the option to extend it in the future to support user settings

silentninja commented 2 years ago

Partially closed by #1391 and the remaining issues are tacked by #1461

mathesar-foundation / mathesar