qxcv / joint-regressor

Regressing joints for fun and profit
Apache License 2.0
2 stars 3 forks source link

H3.6M dataset cache is huge #12

Closed qxcv closed 8 years ago

qxcv commented 8 years ago

At the moment it's 7.2GB on disk (!!), and that's with Matlab's compression. A cursory calculation suggests that joint data alone will take up 1.8GB on disk (when all sequences are summed together). Clearly, I need a better way of loading data than what I've been using. Some sort of lazy loading might help.

qxcv commented 8 years ago

In the test sequences:

In the train data:

It seems that in total, my data is "only" 2.8GB, but Matlab's shitty "compression" is doubling the size.

Some ideas for saving space:

It looks like structs take a huge HDF5 encoding overhead with Matlab's v7.3 save format, so I may have to modify the save process slightly so that big fields are saved in their own variables (of the appropriate numeric type).