tbeu / matio

MATLAB MAT File I/O Library
https://matio.sourceforge.io
BSD 2-Clause "Simplified" License
334 stars 97 forks source link

append data on var #60

Closed cstaub closed 6 years ago

cstaub commented 7 years ago

Hi there! Is it possible to append data to an existing matlab variable? I'm creating the file using Mat_Open, creating a variable using Mat_VarCreate and finally writing data using Mat_VarWrite. Unfortunately I cannot find a way to resize the variable in order to add new data. Any suggestions?

tbeu commented 7 years ago

No, that is not possible. You need to delete that variable from the MAT-file (using_Mat_VarDelete, which basically is a complete rewrite of the MAT-file) and then add the updated variable again (to the end of the MAT-file). This is the workaround, but of poor performance, especially for large variables or files.

cstaub commented 7 years ago

What is the current limitation for this? At least the HDF5-based matlab formats should theoretically allow adding data to an existing array. Best, chs.

tbeu commented 7 years ago

The limitation is that resizing a variable means resizing the file. If this variable is the first variable of the MAT-file it needs to be completely rewritten.

I will check if HDF5 supports updating a variable.

What is your intended workflow? Something like: Open file once, create variable, write variable to file, delete variable, create variable again, update variable in file, close file?

emmenlau commented 7 years ago

It is possible to resize a variable in an HDF5 file, but only in a dimension that has been set as resizable. I do not recall the exact name, it was something like "UNLIMITED". Depending which software created the file (or the dataset, to be more precise) in the first place, resizing might be possible or not.

cstaub commented 7 years ago

The intention is to write large data sets to matlab. I.e. open file once, create variable, add data in loop (i.e. 100mb each iteration), close file. I agree, that for the matlab v4 format it is only possible to append data to the last variable written to file. HDF5 would probably support resizing the matrix, at least in 1 dimension.

tbeu commented 7 years ago

OK, I will check if a v7.3 MAT-file with UNLIMITED dimension is readlable in MATLAB (using load command). For the test I will go for real (non-complex) double matrix. Okay for you? Or do you require a different data type/class?

emmenlau commented 7 years ago

If the HDF5 file is created by matio, then its very likely that Matlab will accept it, even with different dimensionality options. HDF5 makes this very "transparent" and usually the application does not care, unless they have an explicit check (which would be very awkward). HDF5 is pretty cool about this aspect :-)

cstaub commented 7 years ago

sounds perfect to me, thx

cstaub commented 7 years ago

any news?

tbeu commented 7 years ago

Not yet, sorry. But it is not forgotten.

tbeu commented 7 years ago

I created prototype function Mat_VarUpdateRealNumeric73 that can be called instead of Mat_VarWrite. It updates a real numeric array of a HDF5 MAT-file variable along the first dimension.

@cstaub Please check if that suits for you. In my debug test, it worked at least. If yes, I might think how to improve it further, i.e., add tests, add dimension argument, add complex case, generalize to mat.c.

cstaub commented 7 years ago

seems that the "if ( H5Lexists(id, matvar->name, H5P_DEFAULT) > 0 )" - branch does not alter the err from value -1.

Further, I'm actually looking to append data, not to change existing data. E.g. an existing variable with 2 columns and 10 rows shall be extended to 2 columns and 20 rows without rewriting the first 10 rows.

tbeu commented 7 years ago

Updating (instead of replacing) is what the prototype does, when I tested it on my own.

tbeu commented 7 years ago

Just try to call the prototype func two times in a row.

tbeu commented 7 years ago

OK, I updated Mat_VarUpdateRealNumeric73 to set err in both branches.

cstaub commented 7 years ago

well, works perfectly fine for me. Looking forward for the next release

tbeu commented 7 years ago

Thanks for confirmation. Any good name proposal: Mat_VarExtend (how HDF5 calls it), Mat_VarAppend or Mat_VarUpdate?

cstaub commented 7 years ago

I would either prefer append or extend

tbeu commented 7 years ago

I think I'll go for Mat_VarWriteAppend as new API function since it is either writing (the first time if not yet written) or appending. Alternatively I could create a new API function Mat_VarWrite2 (must not change Mat_VarWrite for sake of backward compatibility) that offers a fourth parameter for the append flag.

cstaub commented 7 years ago

ok, I would prefer the Mat_VarWriteAppend version.

tbeu commented 7 years ago

Yes, will add Mat_VarWriteAppend as public API function. Remember, Mat_VarWriteAppend also must used the first time you write that variable (even though there is nothing to append).

cstaub commented 7 years ago

Is this topic still active?

tbeu commented 7 years ago

I spent some hours at the last weekend on it. I wanted to have it in a nice way (w/o much code replication, but gave up the other day). So yes, it still needs to be done.

tbeu commented 6 years ago

@cstaub

tbeu commented 6 years ago

@cstaub Did you find any time to give v1.5.11 a try?

cstaub commented 6 years ago

Yepp, no issues so far. Thx