inpho / vsm

Vector Space Model Framework developed for InPhO
http://inpho.github.io/vsm
Other
35 stars 14 forks source link

Add HDF5 corpus objects #120

Open JaimieMurdock opened 9 years ago

JaimieMurdock commented 9 years ago

Add HDF5Corpus objects that can be extended so long as there is disk space, essentially eliminating some of the MemoryError issues we currently have.

One key to this implementation will be a way to iterate through the HDF5Corpus.corpus attribute, which may involve using the visit() method of h5py's API. Part of this will also have to use the creation order flags to set up read order, ala this Stack Overflow and the flag operations added in this commit to h5py. The core API docs for Set Link Creation Order.