Cleaning steps improvements - edit, delete, reorder, toggle and more

tombarnish commented 1 year ago

Once Data Wrangler has generated the code for a number of data processing steps and saved in Jupyter file is it possible to return to those steps in Data Wrangler to be able to edit, add or remove them.

danv-msft commented 1 year ago

Thank you for using Data Wrangler @tombarnish. We have the reentrance scenario in our backlog for investigation. However, we don't have it scheduled for the immediate upcoming sprints. We will provide an update once we begin working on it.

tombarnish commented 1 year ago

@danv-msft Thanks for the response. On a related point would it be possible to be able to edit earlier cleaning steps. Currently you can only edit the most recent operation. Thanks

hodgigre commented 10 months ago

Seconding this request. Would be great to be able to edit or even just delete any previous steps, not just the current or last step. Understood, like Power Query, editing or deleting a given step could break the subsequent steps and trigger a need for user to correct, but this is acceptable/understandable. However, the inability to go back and correct or even just modify a previous step is irksome. Failing allowing this, is there some workaround/workflow to make editing a previous step possible? Thanks much. Great tool!!!

y2kbugger commented 6 months ago

@danv-msft Man if you could create a really nice human readable persistence format, either domain specific or runnable python. This could create importable sequence of cleaning steps.

Domain specific would be robust, but more magic/less ultimately great.

I think a first class python format would be amazing:

would be importable like normal code
could even type annotate and typecheck
you could use pieces and parts of the pipeline
everything would have clear version control
could be modified outside of wrangler and continue to work.
could be deployed without wrangler to production

e.g. wrangle_ops/titanic_summary.py

def clean_all(df: pandas.DataFrame) -> pandas.DataFrame:
    clean_1(df)
    clean_2(df)

def clean_1(df):
    pass

def clean_2(df):
    pass

The obvious con is how to deal with reparsing possibly modified code. I would suggest have a robust python -> cleaning operation parser, rather than hinting. That way modified code can't get out of sync with metadata in comments. Then anything else wierd can just import as "custom python step"

danv-msft commented 6 months ago

@hodgigre, precisely on your points, the editing steps could disrupt the execution flow and require careful consideration. This feature is on our plan to be added. We will have a more robust plan once we are finished with the current experience and integration with the VS Code Jupyter extension (items that are published in release notes).

@y2kbugger, this is an excellent suggestion. We will consider it for the design and discuss it through an RFC when we address the re-entrancy feature.

Thank you for your great suggestions and please do let us know if there are other ideas.

ggojedap commented 5 months ago

Using and enjoying Data Wrangler so far. I strongly agree with the importance of this features, however, the simple ability to save the current cleanning steps to revisit later would be great.

I've used other similar tools like Tableau Data Prep or Open Refine, but I really enjoy having all the tools that Data Wrangler offers directly in VSCode. The only thing I miss is the ability to save the process, as well as edit the steps.

pwang347 commented 5 months ago

Adding quote from @attilam in https://github.com/microsoft/vscode-data-wrangler/issues/180

The non-destructive nature of the Cleaning Steps stack is fantastic! Walking back and forth in the history of operations is very powerful. It would be cool if it was possible to skip some of them. E.g. on each step there was a "visibility" flag that toggled if the step is applied, or not. Similarly to how modifiers can be toggled on/off in Blender.

For example this would make it possible to play with different settings of an operation without having to add/remove it, or change the script code, especially if we need other steps to be performed afterwards.

Hm, reordering, and inserting steps would be handy for this too... 🤔

fortmana commented 4 months ago

I agree with this notion of needing to be able to edit previous steps. This feels like a missing feature especially given how this looks and feels like a lite version of the Power BI Power Query Editor. I love the ability to generate steps with visual prompts quickly but having to go back and rework a set of steps based on what I learn along the way makes things much more cumbersome.

microsoft / vscode-data-wrangler

Cleaning steps improvements - edit, delete, reorder, toggle and more #62