Closed abeburnett closed 2 years ago
The behavior you describe is consistent with pandas, and is the intended behavior.
I don't know of any good way of doing this right now. Vaex currently does only column operations. You will probably have to construct a full on expression on whether you want to drop a certain row or not, and then do the dropping.
Looking at the code, it actually looks pretty easy to add this, so you can either do df.dropna(how="any")
or df.dropna(how="all")
, which is what pandas does. Also gives it to us for dropnan, dropinf, etc
It would just need to add a different case at
https://github.com/vaexio/vaex/blob/master/packages/vaex-core/vaex/dataframe.py#L5070-L5075 so that instead of doing
expression = expression | f(self[column])
we do expression = expression & f(self[column])
I could write a PR if this is desired @JovanVeljanoski ?
@NickCrews Sure, you can give it a shot! If you attempt to do this, you might also wanna do it for dropmissing
and dropnan
, for consistency between those. Thanks!
When run against a whole dataframe (e.g.,
df.dropna()
) what is the expected behavior? I would expect it to only drop those rows which are entirely populated by na/nan, but it seems like it may be dropping every row which an na/nan in any column.I'd prefer to only drop rows which are completely na/nan in every column. How can I do this with vaex?
As a sidenote, the reason for this need is after importing 222 parquet files I ended up with a bunch of rows filled with na/nan in all columns, and also the same number of columns duplicated but with generic names and no data (filled with na/nan). E.g., columns like COL_1 and COL_2 filled with blanks.
Anyway, thanks in advance!