Closed yohplala closed 3 years ago
Hi,
So there are few points worth addressing here. In no particular order:
.values
does). Once you do that.. well it is out of our hands, and what you do with it is on you. In this case you are comparing numpy
and pyarrow
types, and I guess they do not play well with each other. But that is up to numpy and arrow to handle not vaex. df.ts > np.datetime64('2021-01-01')
, this will work no matter if the underlying data is in numpy or arrow format. This is within vaex, and vaex will handle it. .arrow
file, and that assumes an arrow format as well (see the next point on this). So when you read that arrow file, all of the data (the raw data) will be in arrow format. You can check this via this simple example:import vaex
ts = [1580515230897, 1627875788076, 1627875788076]
x = [1, 2, 3]
df = vaex.from_arrays(ts = ts, x=x)
df['ts'] = df['ts'].astype('datetime64[ms]')
df.export('tmp.arrow')
df = vaex.open('tmp.arrow')
df.x.values # This gets the "raw" data outside of vaex
int
is an int
, float
is a float
etc.. meaning you do not need to know or care if the raw data is in numpy or arrow format. Once the data leaves vaex.. than it is on you! pyarrow
, at least so that you are not surprised if you see anything "strange", i.e. not numpy. Arrow is not just a file format, it is much more than that. df.x.values
gets you the raw data. If you want to enforce that the data extracted from vaex is in say numpy, you should do df.x.to_numpy()
. There is also .to_arrow()
if you want to get data in arrow format.I hope this clears up any confusion!
To add to what Jovan said, df.x.tolist()
can also help, if you want to do some work in 'Python land'.
Let us know if this answered your questions, if so, feel free to close this.
Hi @maartenbreddels hi @JovanVeljanoski , Thanks a lot, yes this answers my questions. Thanks a lot! Bests,
Description Timestamps seem to be quite troublesome. I checked the other issues related to timestamps.
None of them seem to report the behavior I am witnessing here.
The trouble I would like to report is that
datetime64
type is not preseved when readingint
from arrow files and then transforming them to timestamps. In this case, type is thenpyarrow.lib.TimestampScalar
.But it does be
datetime64
when theseint
come from memory.My next trouble is that
pyarrow.lib.TimestampScalar
appears not that easy to manage. I am forced to convert them tostring
to be able to convert then them back todatetime64
. Rather awkward...Here is the trouble illustrated.
Now exporting the same
int
to arrow, reading them back (memory mapping) and converting again.Please,
int
either coming from an arrow file or coming from a list)pyarrow.lib.TimestampScalar
so as to be able to compare them tonumpy.datetime64
?Thanks for your help. Bests,
Software information