simonw / datasette-edit-schema

Datasette plugin for modifying table schemas
Apache License 2.0
16 stars 0 forks source link

Feature idea: make suggestions for schema improvements, such as deleting empty columns #44

Open simonw opened 1 year ago

simonw commented 1 year ago

I've been working with a CSV which turns out to have a bunch of columns with no data. I manually scrolled down the list and clicked "delete" next to each column with no examples shown, but a feature that suggests that and then clicks all the buttons for me could be neat.

Related: could also suggest deleting all columns which only have a single distinct value in them.

simonw commented 1 year ago

Rough UI mockup:

CleanShot 2023-11-05 at 13 02 32@2x
simonw commented 1 year ago

More ideas for suggestions:

If suggestions were driven by a plugin hook there could be fancy ones like spotting location (lat, lon) columns and suggesting splitting those into latitude and longitude columns, but that gets a LOT harder as now we are doing data conversions in addition to just editing the schema with .transform().

Maybe this feature should live outside the datasette-edit-schema plugin? That way it could include features that modify data directly. It could also suggest things like "why not setup FTS against this column with lots of text in it?".

simonw commented 1 year ago

There's also something interesting about generating these suggestions as an offline process (or a separate asyncio task in the same process not connected to the current request) - that way Datasette could do expensive things like scan millions of rows for potential columns that could be converted to a date.