microsoft / vscode-data-wrangler

Other
418 stars 16 forks source link

Feature Request - delete rows based on condition #89

Closed dreadedhamish closed 1 year ago

dreadedhamish commented 1 year ago

Loving Data Wrangler so far.

I'm currently cleaning up a source that uses int64 for one column, but they include an "Other" row that has "ZZZZZZZ" as the value. I'm envisioning cleaning up data but opening a source, changing the type of columns and seeing what errors are produced.

At this stage it would be handy to have a preset operation for Drop Row based on value, regex, length etc... Bonus points would be exposing in the Data Wrangler tab the conflicting row (rather than just the value) with some shortcuts (add a step - drop this row, drop rows like this etc...)

pwang347 commented 1 year ago

Hi @dreadedhamish, thanks for the feedback!

Just wondering if you've tried the Filter operation to remove unwanted rows:

image

Would this help your use case? Or perhaps were you hoping to have the conflicting outliers be detected automatically at startup?

dreadedhamish commented 1 year ago

Filter! Yeah that helps alot - I don't know why I didn't notice it. Could a "Drop filtered rows" checkbox be added?

When there is a conflict it would be good to get some more details rather than just the conflicting cell value.

pwang347 commented 1 year ago

Filter! Yeah that helps alot - I don't know why I didn't notice it. Could a "Drop filtered rows" checkbox be added?

When there is a conflict it would be good to get some more details rather than just the conflicting cell value.

Awesome! Yes, we should be able to add a "Drop filtered rows" option to use the inverse condition - we'll take a look at adding that in the upcoming releases.

Regarding conflicts, could you elaborate a bit on what sort of details you'd be interested to see?

pwang347 commented 1 year ago

Small update - we've merged a change to optionally drop the filtered rows and it will be available in next week's pre-release!

pwang347 commented 1 year ago

Closing the issue for now since the main ask was addressed, feel free to open a new issue regarding the conflicting cell value if needed. Thanks!