vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.28k stars 590 forks source link

[FEATURE-REQUEST] Make HDF5 output configurable #2061

Closed Darktex closed 2 years ago

Darktex commented 2 years ago

Description Vaex has an excellent hdf5 exporter, but the structure of the file it generates is currently static: it will always write each column under the /table group, under /table/columns.

It would be great if we could make this output configurable, and if we could also allow for custom attributes in the HDF5 file, so that Vaex can be used upstream of packages that require a fixed HDF5 file structure.

Is your feature request related to a problem? Please describe. My current use case is about using Vaex together with PyTorch-BigGraph. BigGraph requires that edges be put in a HDF5 file with a fixed structure (see their docs), which prevents me from using Vaex directly.

JovanVeljanoski commented 2 years ago

Hi,

df.export_hdf5 does accept the group argument and you can use it to specify the path here the data is stored within the hdf5 file.

Check out the docs for more instructions.

Does this help, or did you mean something completely different?

JovanVeljanoski commented 2 years ago

Closing as stale. Please re-open if needed.