Open tombarnish opened 1 year ago
Thank you for using Data Wrangler @tombarnish. We have the reentrance scenario in our backlog for investigation. However, we don't have it scheduled for the immediate upcoming sprints. We will provide an update once we begin working on it.
@danv-msft Thanks for the response. On a related point would it be possible to be able to edit earlier cleaning steps. Currently you can only edit the most recent operation. Thanks
Seconding this request. Would be great to be able to edit or even just delete any previous steps, not just the current or last step. Understood, like Power Query, editing or deleting a given step could break the subsequent steps and trigger a need for user to correct, but this is acceptable/understandable. However, the inability to go back and correct or even just modify a previous step is irksome. Failing allowing this, is there some workaround/workflow to make editing a previous step possible? Thanks much. Great tool!!!
@danv-msft Man if you could create a really nice human readable persistence format, either domain specific or runnable python. This could create importable sequence of cleaning steps.
Domain specific would be robust, but more magic/less ultimately great.
I think a first class python format would be amazing:
e.g.
wrangle_ops/titanic_summary.py
def clean_all(df: pandas.DataFrame) -> pandas.DataFrame:
clean_1(df)
clean_2(df)
def clean_1(df):
pass
def clean_2(df):
pass
The obvious con is how to deal with reparsing possibly modified code. I would suggest have a robust python -> cleaning operation parser, rather than hinting. That way modified code can't get out of sync with metadata in comments. Then anything else wierd can just import as "custom python step"
@hodgigre, precisely on your points, the editing steps could disrupt the execution flow and require careful consideration. This feature is on our plan to be added. We will have a more robust plan once we are finished with the current experience and integration with the VS Code Jupyter extension (items that are published in release notes).
@y2kbugger, this is an excellent suggestion. We will consider it for the design and discuss it through an RFC when we address the re-entrancy feature.
Thank you for your great suggestions and please do let us know if there are other ideas.
Using and enjoying Data Wrangler so far. I strongly agree with the importance of this features, however, the simple ability to save the current cleanning steps to revisit later would be great.
I've used other similar tools like Tableau Data Prep or Open Refine, but I really enjoy having all the tools that Data Wrangler offers directly in VSCode. The only thing I miss is the ability to save the process, as well as edit the steps.
Adding quote from @attilam in https://github.com/microsoft/vscode-data-wrangler/issues/180
The non-destructive nature of the Cleaning Steps stack is fantastic! Walking back and forth in the history of operations is very powerful. It would be cool if it was possible to skip some of them. E.g. on each step there was a "visibility" flag that toggled if the step is applied, or not. Similarly to how modifiers can be toggled on/off in Blender.
For example this would make it possible to play with different settings of an operation without having to add/remove it, or change the script code, especially if we need other steps to be performed afterwards.
Hm, reordering, and inserting steps would be handy for this too... 🤔
I agree with this notion of needing to be able to edit previous steps. This feels like a missing feature especially given how this looks and feels like a lite version of the Power BI Power Query Editor. I love the ability to generate steps with visual prompts quickly but having to go back and rework a set of steps based on what I learn along the way makes things much more cumbersome.
Once Data Wrangler has generated the code for a number of data processing steps and saved in Jupyter file is it possible to return to those steps in Data Wrangler to be able to edit, add or remove them.