Closed mmccarthy404 closed 3 years ago
I think this is due to a limitation of HDF5 itself, where the object header (in this case the object is a compound dataset representing the data.frame) has a maximum size of 64kb, and once we exceed 1092 columns that limit is broken. Here's another example of someone hitting the same problem: https://forum.hdfgroup.org/t/storing-a-large-number-of-floating-point-numbers-that-evolves-over-time/4026
I don't think there's a fix to solve this directly, rather you might need to write the data in a slightly different format.
The simplest alternative is to use:
rhdf5::h5write(head(data), tmp, "grp_1/data", DataFrameAsCompound = FALSE)
This will create a separate dataset for each column in the data.frame, named using the column names.
If you need to read back into R, h5read(..., "grp_1/data")
will be able to read all columns back into a list, which you can then turn into a data.frame.
Alternative, if your data.frame is all of one type (in you're example it's initially a matrix of numeric values) you could stick with the matrix format and it will write a 2D dataset rather than a compound. The dimension limits are pretty huge for a single datatype and you can easily to 150 by 2000, but it will only work if your data.frame can be sensibly coerced to a matrix.
It's hard to know which will be better without knowing a bit more about the actual data you're trying to write, but I'm happy to iterate on advice.
I'm experiencing the following error trying to write a dataframe with approximate dimensions of 150 by 2000:
I created this small example to highlight the problem that I believe is due to the large number of columns:
For me, this fails to write any dataframe with more than 192 columns.
Any assistance is appreciated!