I have mor then 500k indvidual csv files with an average size >~ 1000 records and I want to convert them to one hdf5 file. Plain pandas read_csv and to_hdf with append=True would take a week or so. vaex see to support this but then I have 2 problems:
1) I need to add a constant column to each csv -> df (the name of the file) => I can't use read_csv(convert=True)
2) I have timestamps from all over the planet and I expect them to keep the timezone as I will need it later again => can't df.export
Originally posted by **Hasham04** April 10, 2022
I am reading a parquet file and one of the date-time columns is of type timestamp[ms, tz=UTC]. I have tried converting this to
`df['time'].astype('datetime64[ms]') `
`df['time'].astype('timestamp')`
but I always get the error ` raise NotImplementedError(f'Cannot convert {arrow_type}')
NotImplementedError: Cannot convert timestamp[ms, tz=UTC]
`. Any suggestions on how to convert this to a supported type.
I can print the data frame but the second I try to interact with this column like doing `df.dtypes` I get this error.
I want to convert this column to a sting ideally so that I can concat it with another string column.
thanks for your help.
I have mor then 500k indvidual csv files with an average size >~ 1000 records and I want to convert them to one hdf5 file. Plain pandas read_csv and to_hdf with append=True would take a week or so. vaex see to support this but then I have 2 problems:
1) I need to add a constant column to each csv -> df (the name of the file) => I can't use
read_csv(convert=True)
2) I have timestamps from all over the planet and I expect them to keep the timezone as I will need it later again => can'tdf.export
Discussed in https://github.com/vaexio/vaex/discussions/2008