Open NickCrews opened 1 year ago
A Column object plays more the role of an array, it basically has the API of a numpy and arrow array that vaex requires (.dtype, __getitem__, __len__
). For arrays we also get the expression API after we add them to a dataframe, so a Column object not having the Expression API I think is consistent with numpy an arrow not having it.
I do agree it would be nice if .isna()
could be overridden by the column objects. We could support this at the function.py
level, for instance, .isna()
already needs to be aware of numpy and arrow.
Or we could have a special method in the column object that can override the .isna()
similar to NEP13.
Description It would be nice if I could treat the result of
vrange()
andvconstant()
as though they were Expressions. e.g. be able to call.astype()
orisna()
on them.I think this would make sense because if we interpret
ColumnVirtualConstant
andColumnVirtualRange
as "implementations of"Expressions
, as in they have an "is a" relationship. Am I missing something here, do they actually serve distinct roles?Additional context It's not a huge deal to get around this by assigning the columns to a DF, and then they are converted to Expressions:
but it would be nice to be able to do this directly.
To implement I'm not exactly sure what to do. The simple way would be to leverage the implementation that happens above. However, this isn't super optimized, since it can be deduced before materialization that
vconstant(1, length=1_000_000_000).isna()
should result invconstant(False, length=1_000_000_000)
. Perhaps the best way is to use the simple way by default, but leave the door open to write custom overrides for certain cases if someone desires.