[sheets] Handle empty columns in Datagrids

bharatkashyap commented 2 years ago

Duplicates

[X] I have searched the existing issues

Latest version

[X] I have tested the latest version

Current behavior 😯

Expected behavior 🤔

The Datagrid should ignore empty columns in the query state data

Steps to reproduce 🕹

Steps:

Create a Google Sheets API
Connect to a Sheet
Create a Query State containing the Sheets API data
Create a Datagrid and bind it to the above Query State

Context 🔦

No response

Your environment 🌎

https://toolpad-pr-236.onrender.com/app/cl280m3j30003a1kpalmidzdy/1/pages/cl281we3j00013f675rer18yv

bharatkashyap commented 2 years ago

@Janpot Seems like a few columns get dropped between the API and the data binding to the Datagrid, since all columns including and after the empty column get dropped in this app:

bharatkashyap commented 2 years ago

@Janpot

This issue is because the DataGrid component attempts to infer columns based on the first row of the passed data, but this data does not contain the "header" row, since that is shifted and stored in a separate key called fields by the server; however, the queryState ignores this attribute and only stores data.

At a broader level, to my mind, the approach depends on where must the user be asked the question, "Interpret Header Rows as Column Names" - at the time of creating the DataGrid, or at the time of creating the API?

Intuitively, I want to side with the former. So, a possible solution could be to remove the transformation logic from the server-side exec and pass the exact response as data.

Janpot commented 2 years ago

I guess "Interpret Header Rows as Column Names" only applies to a certain amount of datasources. Many datasources will have well defined interfaces for their data and have no use case for guessing the shape of their records. In that sense it could make more sense to keep it an API-level option to parse the header row and transform the data to an array of objects.

bharatkashyap commented 2 years ago

I guess "Interpret Header Rows as Column Names" only applies to a certain amount of datasources. Many datasources will have well defined interfaces for their data and have no use case for guessing the shape of their records. In that sense it could make more sense to keep it an API-level option to parse the header row and transform the data to an array of objects.

In that case, should the infer columns function behaviour (inside the Datagrid) also react to whether the "interpret header rows as column names" option was set or not? If not, it would continue to work as is; if yes, it would defer to a columns property that we could add to the data query response?

Janpot commented 2 years ago

I'm not sure I understand the problem.

Suppose the sheets data looks like:

[
  ['a', 'b', 'c'],
  [1, 2, 3],
  [4, 5, 6],
]

if the user checks "first row are columns", then the API will return

[
  { a: 1, b: 2, c: 3 },
  { a: 4, b: 5, c: 6 },
]

And the DataGrid will infer 3 columns, "a", "b" and "c". and show two rows

If, on the other hand, "first row are columns" remains unchecked, the API will return

[
  ['a', 'b', 'c'],
  [1, 2, 3],
  [4, 5, 6],
]

And then I expect the DataGrid to infer columns "0", "1" and "2" and show 3 rows of data.

It's important to note that the column inference only runs when no columns are configured yet. At any time the user can override and the DataGrid should never update the overrides (unlike retool). It just exists to provide a set of defaults for the user to start from.

bharatkashyap commented 2 years ago

Ah; passing the first row as an array in the latter case is the solution I was missing. Object.entries should be able to infer "0", "1" and "2" in that case as you said.

About empty strings in the returned fields, what should be the behaviour in either case? I think we should be able to safely skip them in either case.

Janpot commented 2 years ago

If the user doesn't check "first row are columns", no transformation is done to the data whatsoever, and I think on the frontend we should interpret that literally as "this data has no named columns".

Empty strings in the returned field names makes the column ignored, in the value it should be kept, e.g.

[
  ['a', '', 'c'],
  ['', 2, ''],
  [4, 5, ''],
]

becomes

[
  { a: '', c: '' },
  { a: 4, c: '' },
]

basically, if the user says the first row contains the headers, and there is one cell without a value in the header, then we assume it's not a column in the data.

bharatkashyap commented 2 years ago

If, on the other hand, "first row are columns" remains unchecked, the API will return
[
  ['a', 'b', 'c'],
  [1, 2, 3],
  [4, 5, 6],
]
And then I expect the DataGrid to infer columns "0", "1" and "2" and show 3 rows of data.

In this case, there would need to be some transformation on the Datagrid to turn the data into an acceptable format for the rows prop, right?

Janpot commented 2 years ago

At the moment, the DataGrid requires an id column. We can think about supporting the getRowId prop in some form.

bharatkashyap commented 2 years ago

Does it make sense to add an id regardless of the option selected by the user, and transform the rows in the following manner:

If the sheet has the following data:

[
  ['a', 'b', 'c'],
  [1, 2, 3],
  [4, 5, 6],
]

If the user checks "first row are columns", then the API will return

[
  { id: 0, a: 1, b: 2, c: 3 },
  { id: 1, a: 4, b: 5, c: 6 },
]

And the DataGrid will infer 3 columns, "a", "b" and "c". and show two rows

If, on the other hand, "first row are columns" remains unchecked, the API will return

[
  [{ id: 0, 0: 'a', 1: 'b', 2:'c'}],
  [{ id: 1, 0: 1, 1: 2, 2:3}],
  [{ id: 2, 0: 4, 1: 5, 2:6}],
]

The DataGrid will infer columns "0", "1" and "2" and show 3 rows of data.

Janpot commented 2 years ago

I don't think we need to make too many assumptions about what kind of data is coming from APIs. It's easy to add some cheap magic to Toolpad, it will be hard to take it back out again or work around its quirks. APIs may be used in other places than DataGrids. For this specific problem, I believe the real solution rather lies in the getRowId property of the datagrid.

In the future we can start thinking about a higher level concept than APIs that represent collections of entities. Strongly typed, that allows listing, selects, updates, deletion, creation, introspection through a common interface. A bit like jetadmin has.

bharatkashyap commented 2 years ago

I don't think we need to make too many more assumptions about what kind of data is coming from APIs. It's easy to add some cheap magic to Toolpad, it will be hard to take it back out again or work around its quirks. APIs may be used in other places than DataGrids. For this specific problem, I believe the real solution rather lies in the getRowId property of the datagrid.

In the future we can start thinking about a higher level concept than APIs that represent collections of entities. Strongly typed, that allows listing, selects, updates, deletion, creation, introspection through a common interface. A bit like jetadmin has.

Is it sufficient then to do something like:

  getRowId={(row) => row.id ?? generateRandomValue}

Janpot commented 2 years ago

I think we should expose a property on the DataGrid component that allows you to set the field that represents the row ID. Under the hood it would use getRowId

bharatkashyap commented 2 years ago

I think we should expose a property on the DataGrid component that allows you to set the field that represents the row ID. Under the hood it would use getRowId

Would this amount to adding an idField property to the component definition:

idField: {
   typeDef: { type: 'string' }
 }

The control could be a dropdown containing the inferred column names? Perhaps in the future this could become a "Set as Id" in the GridColumns dialog.

getRowId could then become:

const getRowId = React.useCallback( (row: any) => {
if(idField && row.idField) return row.idField 
return crypto.getRandomValues(new Uint16Array(1))[0])
}, [idField])
...

getRowId={(row) => getRowId(row)}

The DataGrid would need to re-mount whenever idField changes, perhaps via key={idField}?

Janpot commented 2 years ago

I don't think we should assign random values. But I've been thinking about it a bit more, we can probably improve inference a bit within the DataGrid. A bit like https://github.com/mui/mui-toolpad/issues/310#issuecomment-1111852322 but inside the DataGrid instead of the API. That way we're not destructive towards people's data, it's purely centred around visualization.

One way that could look like is:

As you said, add a property rowIdField: string to the DataGrid

If the property is set, we assume the user knows what they're doing and short circuit the logic here. None of the following will be executed.
check the first row, if it contains a property id, we assume the whole dataset has ids and short circuit the logic here. None of the following will be executed.
At this point, we assume the user hasn't taken care of adding ids (yet) to their data. Internal to the component we map the rows similar to https://github.com/mui/mui-toolpad/issues/310#issuecomment-1111852322, we add an id property containing the index of the row.

As for what UI should come with that rowIdField, there are 3 options. In increasing difficulty:

just a TextField
a Select containing all possible fields
a checkbox within the columns editor that represents "this field is the ID"

I think 1. and 2. are relatively straightForward. 3. will require some rethinking of how component properties work, it probably shouldn't be part of the same PR.

mui / toolpad