gristlabs / grist-core

Grist is the evolution of spreadsheets.
https://www.getgrist.com/
Apache License 2.0
7.04k stars 311 forks source link

Create unique index on a column #706

Open remileblond67076 opened 11 months ago

remileblond67076 commented 11 months ago

We often need to define a unique constraint on columns, in order to prohibit the creation of duplicates in a table.

This would correspond to adding a unique index to a column in the SQLLite database.

For example :

CREATE TABLE table_name(
    ...,
    column_name type UNIQUE,
    ...
);

Or

CREATE UNIQUE INDEX ux_unique_name ON table_name(column_name );

Enabling this constraint could be set using a checkbox in the column configuration interface.

remileblond67076 commented 11 months ago

Grist-unique

fflorent commented 11 months ago

@remileblond67076 I have started working on this issue.

Regarding the UI:

2023-10-24_10-35

fflorent commented 11 months ago

cc'ing @kbizien

anaisconce commented 11 months ago

Our UX expert, Aurélie, suggested that this option could be enabled for any type of columns

Would it be for any type of column, or any type of string column, e.g. text, choice, and choice list?

Trying to think through the practical applications of uniqueness on numeric, boolean and attachment columns. Date and datetime are also unclear to me -- maybe there's a case, but usually not. Reference and reference list also become hazier.

As for design, the table tab in the creator panel has an example of checkboxes.

image

Re: uniqueness configuration design, if uniqueness were supported for all column types, it might make more sense to put the configuration in the column behavior section. The top section is about label, id and description. Adding uniqueness there is a bit surprising to me. image

If uniqueness is not supported for all columns, what do you think of placing uniqueness configuration after formatting and styling? This is currently what follows, and the space isn't used very well. Kevin might have better ideas. image

dsagal commented 11 months ago

Databases support unique indices on a combination of columns. For example:

CREATE UNIQUE INDEX idx_student_course
ON student_courses (student_id, course_id);

(i.e. a course enrollments table can only have a single enrollment for a particular student for a particular course).

As with SQL syntax, it makes sense to have a per-column way to say "make this column unique", but perhaps we can design it early on to support combinations of columns, so that the single-column UI is just a shortcut to creating a unique index.

Also, just wanted to add an observation: unique indices have a second benefit (besides enforcing uniqueness) of making certain operations much more efficient. So this feature will come in useful in the future when making Grist work for larger data (#43).

paulfitz commented 11 months ago

Small implementation note. Grist has an obscure "on-demand table" feature that uses indexes, and some of the code involved would need a small tweak to be aware of any other indexes added. Specifically: https://github.com/gristlabs/grist-core/blob/e51c1b6b9274dbc22414824dfd823494f446b7de/app/server/lib/DocStorage.ts#L1499-L1532

So if someone makes a proof of concept and sees Grist continually deleting the index they are carefully adding, that will be why. It is no big deal, should be a small change.

fflorent commented 11 months ago

Would it be for any type of column, or any type of string column, e.g. text, choice, and choice list?

For all of the types of column available, including the boolean (even if the use case is probably not interesting), integers / numbers, text, choice / choice list, ... I just wonder if that would work well with attachment columns (if that's not relevant, may it also cause performance issue? IDK).

Trying to think through the practical applications of uniqueness on numeric, boolean and attachment columns. Date and datetime are also unclear to me -- maybe there's a case, but usually not. Reference and reference list also become hazier.

For references: say you use Grist as a task management for a team. You have 2 tables: People and Tasks. The team members can pick tasks by assigning themselves through the Tasks table. Also you don't want them to pick several tasks at the same time (they must unassign themselves before taking another task).

For dates, I suppose you may want to fill a calendar with one item per date.

For the other types, I admit I don't have much relevant use cases.

As for design, the table tab in the creator panel has an example of checkboxes.

Yes, I was thinking of the column config panel separately, not including the table config panel... Though I am probably overthinking about this, you are probably right and we should go with checkboxes.

I'll try to come back with some mockups soon so we can stimulate the discussion.

As with SQL syntax, it makes sense to have a per-column way to say "make this column unique", but perhaps we can design it early on to support combinations of columns, so that the single-column UI is just a shortcut to creating a unique index.

Right, when you say "perhaps we can design it early on to support combinations of columns", you suggest that technically the backend supports this combination so we pave the way for future UI enhancements?

remileblond67076 commented 11 months ago

From my point of view, it would be great to be able to take advantage of SQLLite's uniqueness constraints. I'm not sure there's much sense in putting a uniqueness constraint on a Boolean or a list of values / references. But who can do more can do less... ;-)

fflorent commented 9 months ago

An additional we have a user whose need is to ensure uniqueness of entries with case insensitivity for emails (Foo.Bar@example.org should not be added when foo.bar@example.org already exists).

emanuelegissi commented 7 months ago

This would be great!

lusebille commented 3 months ago

I've made this proposition keeping the checkbox, but make it disabled when type of column is not yet selected, everything detailed here https://www.figma.com/design/wcpetFt6aOKzTszcvPPWLQ/%5B05%2F24%5D-Grist-Design?node-id=294-15720&t=9hvbpy8Fg7eoAurj-1

Image

dsagal commented 2 months ago

For UI, I'd like to come back to this wish: "design it early on to support combinations of columns, so that the single-column UI is just a shortcut to creating a unique index."

I propose the following:

  1. Implement support for unique indexes similar to SQLite's support.
  2. Introduce "UNIQUE INDEXES" section in the creator panel's "Table" tab, under "Data" subtab.
  3. Use UI similar to Sort UI (in the "Sort & Filter" subtab):

    Screenshot 2024-07-09 at 1 08 15 AM

    a. What's missing is that, unlike for Sort, we need to be able to add/remove indexes, give them names, and add/remove columns within each one. b. The options menu used for advanced sort options can be used for advanced index options, which can be added in follow-ups (e.g. case sensitivity, or treatment of empty values).

  4. For the simple (and common) case of a single-column unique constraint, a checkbox in a column's creator panel as in @lusebille's design, is mostly sufficient, and should be equivalent to creating a unique index with an auto-generated name. a. It's a heavy operation (esp. for large tables), so a simple toggle might not be the best UX. It may be better to click a link that opens a popup, asks for confirmation, and shows a spinner. b. Single-column indexes should be reflected both in "Table" tab - "Data" subtab; and in the column creator panel; regardless of how it was created.
  5. With both UI versions, if an index is created and there are duplicates in the column (or combination of columns), there should be either a question to the user, or the operation should be disallowed.
lusebille commented 1 month ago

I've made changes on figma according the specifications above ( I will present you next UX meeting )

dsagal commented 1 month ago

To clarify my request for generalization, I'd like for the feature to support multi-column indexes. For example:

A given table may have more than one index. Each index may include one or more columns. Each column may be marked "ascending" or "descending" (similar as for sorting).

Each index probably should have a name. Not sure since names could be auto-generated, but a good name could make it easier. Even an auto-generated name needs to be visible, as it is sometimes needed for some database operations.

Each index may specify "UNIQUE" or not. A "unique" index should prevent duplicate entries from being stored. A non-unique index has different purposes (mainly, efficiency of certain queries and operations), but is still valuable.

emanuelegissi commented 1 month ago

IMHO, what @dsagal wrote is of paramount importance.

paulfitz commented 1 month ago

Each column may be marked "ascending" or "descending" (similar as for sorting).

This is useful, but am I seeing a significant increase in scope from the original goal of specifying a uniqueness constraint?

Each index may specify "UNIQUE" or not.

@dsagal does this extend the scope of the eventual implementation to indexing in general?

dsagal commented 1 month ago

Each index may specify "UNIQUE" or not.

@dsagal does this extend the scope of the eventual implementation to indexing in general?

Yes, this is a generalization of the original request, and is intended to support indexing in general. The motivation is that we think we'll want general indexing for cases when it matters for performance, and it would be more efficient for us to design and build that, with unique indexes as a special case of the more general indexing.