Closed pngwn closed 2 years ago
populate dataframe fom file @merveenoyan create issued https://github.com/gradio-app/gradio/issues/945 discussing uploading csv/ tsv files into the dataframe.
I think this makes perfect sense. We are alrady doing this for the timeseries, adapting it for the dataframe should be straightforward.
@merveenoyan could you clarify the first part of that issue. Are you saying it would be good for the dataframe to accept different values in the python library (i.e. the default_value kwarg)? or that It would be good to be able to modify what is displayed after a user uploads the file via the UI (i.e. only showing the first/ last 5 rows, etc.)?
so normally data scientists read CSV/TSV/XLSX files and turn them into dataframe using pandas and see the header of the dataframe afterwards, it's quite typical.
import pandas as pd
df = pd.read_csv(file_path) # this directly reads file into a dataframe
df.head(number_of_rows) # shows the first number_of_rows rows
df.tail(number_of_rows) # shows the last number_of_rows rows
people do this in colab and use it to demonstrate, meanwhile gradio doesn't have it. It has a dataframe which you can't read from a file. It would be much better for tabular data workflows if we had a component that reads from file into a dataframe. (like have a drag and drop interface that turns into dataframe directly after the file is uploaded) and then if there's a model running in the background it could do inference and put the results on outputs side. When you read a file into a dataframe no modification is needed, I feel like no one does that.
Beautiful! I think you got everything @pngwn
Thanks for wrapping up @pngwn! here are my thoughts on some UX points:
I think allowing users to set the column width is not a good idea. The whole purpose of gradio is to generate high quality web apps to share and showcase models. This should work across device sizes and screen widths. Tables automatically adapt column widths to accomodate their content, providing an API that will almost definitely break the UI is not 'pit of success' stuff.
I agree 100% 👍
Modifying the input of a cell requires double-clicking on it. I would love to be able to just click and add my input. There were also a couple of dev experience improvements I would love to see
I agree with @osanseviero on this but to be clear a single click will not put your cursor in the cell, it will focus the cell exactly like today then only if there is keyboard interaction while a cell is focused it will edit it. We should just do it like Airtable does it: single click -> cell focus -> keyboard action -> input or double click -> toggle the cursor in the cell. It's a bit hard to explain so I think it's worth trying Airtable if you want to understand this behaviour.
But I think we can go further if we expanded the datatype kwarg.
enums/ unions would render a dropdown or autocompleting dropdown thing. we could support prefix + suffixes for currencies + measurements. `datatype=[()] it might be possible to support custom validators in the future as long as they are regex based.
I agree, maybe in could be shipped in a 2nd iteration. Airtable has done a great job with that too, it can be a good inspiration to start with:
Would love to get people's thoughs on what good creation and deletion might look like, are there other datatables you have seen in the wild that do this well while remaining very compact?
I think I can figure out something for this.
Last thing to note is that Dataframes are used in AutoTrain, Dataset viewer, and Gradio.
There are a number of issues with the Dataframe component as it exist today and we need to do some work to fix the outstanding but also improve the usability for humans.
We can use this issue to keep track of the issues that have been reported and come up with a design that addresses the usability issues. I'll start with a simple proposal and we can discuss from there.
python API Changes
Today the dataframe API looks like this:
modifying column width
Proposal: _
col_width
should be removed._I think allowing users to set the column width is not a good idea. The whole purpose of gradio is to generate high quality web apps to share and showcase models. This should work across device sizes and screen widths. Tables automatically adapt column widths to accomodate their content, providing an API that will almost definitely break the UI is not 'pit of success' stuff.
fixed column and row count
Proposal: _
col_count
andcol_width
should take either anumber
or a tuple of `(number, "fixed"|"dynamic")._in #868 @osanseviero wrote:
We do not currently have a mechanism to prevent end-users from creating new columns and rows. I think we have two options here:
col_fixed
androw_fixed
boolean kwargs. This adds additional options to dataframe when it already has a lot of kwargs, it could start to get overwhelming if we keep adding to the API, but it is simple and would work.row_count
andcol_count
to take either a number (e.g.3
) or a tuple of(number, "fixed"|"dynamic")
This does expand the API for*_count
but I quite like it as it binds two highly related options together. This would look likecol_count=(3, "fixed")
.col_count=3
would essentially be shorthand forcol_count=(3, "dynamic")
.Note: We could rename these kwargs to
col
, androw
I propose the second option (tuple) but I do not feel strongly about it.
conflicts and confusements with kwargs relating to col and row quantity
Proposal: _
headers
,col_count
,row_count
,default_value
should be validated to ensure there are no conflicts_.More specifically: Any combination of kwargs that can set the column count must always equal the same number of columns. Any kwargs that can set the row count must not result in provided data being hidden.
headers
andcol_count
can conflict;default_value
andheaders
can conflict (kinda);default_value
,col_count
,row_count
anddefault_value
can conflict.This is easiest to explain with examples.
This is confusing but not necessarily an issue:
However this is just wrong and will lead to unexpected behaviour:
What should happen here:
And here:
We need to figure out simple rules to validate datafram inputs that affect the ciolumns + widths, or decide how to normalise.
Some possible rules aimed at removing ambiguity:
headers
andcol_count
are provided, the length ofheaders
must be equal tocol_count
.headers
anddefault_value
are provided, the length of each piece of column data indefault_value
should match the length ofheaders
.default_value
andcol_count
are provided, the length of each piece of column data indefault_value
should be equal to col_count.default_value
androw_count
are provided, the length of the row data indefault_value
should be equal to or less than therow_count
. (This isn't essential but would lead to weird behaviour).The obvious counter to this is that we could add additional values to
default_value
orheaders
to 'fill in the gaps' but I think the API will be far easier to reason about for users if we have clear rules. It will allow us to easily detect errors and provide helpful messages to users. Trying to guess what users want without being explicit is how perl happened.This validation would happen at python time, and we could provide error messages like:
proposed python API
UX improvements
make cells easier to interact with
in #868 @osanseviero said:
I'm not certain about this.
The current behaviour mimics how most spreadsheets work but users of spreadhseets freequently move around the spreadsheet befopre editing. Our dataframe is not a powertool but a quick user entry tool, so perhaps ease of data entry is more important than ease of cell navigation.
If we change
click
behavioour, we also need to change keyboard behaviour for parity of usability. Essentially this feature request is to remove the different 'states' from the dataframe, so that it is essentially 'edit only', rather than having view/ edit modes as without click triggering that state it would be impossible to get to. Static or output dataframes would still have this behaviour.@omarespejel Could you add some mroe details about how you would like to interact with the dataframe. Not just click but how would you like to change to a different cell, how would that work for keyboard users who do not or cannot use a mouse?
better inputs when the datatype is given
in #868 @osanseviero said:
Currently everything is treated as a string by the frontend, even when we know the datatype. I think we can improve this significantly.
number
fields could use a number input which will only allow number entryboolean
fields could use a toggle or checkboxdate
fields could use a date inputstring
fields will use the current textbox functionalityBut I think we can go further if we expanded the datatype kwarg.
Be good to get your thoughts @gary149
populate dataframe fom file
@merveenoyan create issued #945 discussing uploading csv/ tsv files into the dataframe.
I think this makes perfect sense. We are alrady doing this for the timeseries, adapting it for the dataframe should be straightforward.
@merveenoyan could you clarify the first part of that issue. Are you saying it would be good for the dataframe to accept different values in the python library (i.e. the
default_value
kwarg)? or that It would be good to be able to modify what is displayed after a user uploads the file via the UI (i.e. only showing the first/ last 5 rows, etc.)?row + col creation and deletion
631
Row and column creation and deletion needs some work. Deletion isn't currently possible.
Would love to get people's thoughs on what good creation and deletion might look like, are there other datatables you have seen in the wild that do this well while remaining very compact?
Another for @gary149
Redesign
Just putting this here for posterity. Things are being redesigned.
bugs
We have bugs:
Issues relating to features for tracking purposes:
Let me know if I have missed anything and would be good to get people's thoughts on this.
cc @abidlabs @aliabid94 @dawoodkhan82 @aliabd @FarukOzderim