Open NickCrews opened 2 years ago
Actually I do not thing this is a bug..
Look in at the arrow documentation there is not such thing as uint
.
So in your test, if you use .astype(uint64)
for instance, things will work..
I guess we could make an alias for uint
to be uint64
to account for this..
what do you think @maartenbreddels @NickCrews ?
hmm, that makes sense why it doesn't work.
If we were starting from scratch, I might actually lean the opposite way: Make uint
fail for BOTH numpy and arrow, and force users to be explicit with asking for uint64
. But that would break people, so probably we can't change to that behavior now.
If vaex is trying to be a higher level abstraction that hides the differences between numpy and arrow (I think this would be a great goal, but IDK how attainable it actually is) then I would like the alias proposal. However, if there are other cases where I DO need to know which is the backend for my data (eg https://github.com/vaexio/vaex/pull/2192), then I would prefer if vaex explicitly left things as is and didn't try to do something clever. So IDK, I think it depends on the larger goals.
I'm fine closing this as "not a bug" and just being more explicit in the docstring for astype()
.
I think we generally agree.
I think the main idea (as much as we can make it) is that an average user should not care or even know whether the data lives in arrow or numpy underneath it all, as long as it is handled via vaex. When you get it out of vaex (like with .values
or .to_numpy()
for example, that's a different story.
And we do want most obvious things to work out of the box with safe general assumptions. I still think that many users are not so knowledgeable about (py)arrow yet.. so it is nice to have some higher abstraction.
I am curious to hear @maartenbreddels opinion on this , so let's keep this open for now, and thanks for reporting!
See added xfailing test: https://github.com/vaexio/vaex/pull/2190/commits/538a5a68db8242995ae27ef96f2cb7ae6e585e2e