sandialabs / slycat

Web-based data science analysis and visualization platform.
http://slycat.readthedocs.org
Other
75 stars 19 forks source link

hdf5 evaluation strength weakness #957

Closed Mletter1 closed 4 years ago

Mletter1 commented 4 years ago

we need to write/make a table for pros and cons of hdf5

Mletter1 commented 4 years ago

investigate dendogram

pjcross commented 4 years ago

I'm not looking for just a table of pros and cons. I want an analysis in the form of a document with sources for the information/conclusions you provide. This analysis needs to inform questions about how to proceed with our data architecture. This is not only for our data-centric plans, but also for whether we want to transition data that is currently stored as project data to hdf5. We need to know exactly what sorts of data we are currently storing as project data. Is that data in the form of a tree (i.e. the dendrogram generated by time series)? If yes, can you discover any advantages to storing this data as project data instead of hdf5 (less space, faster storage, faster access for viewing)? Are non-linear (non-array) structures handled well in hdf5? Do the newer Python libraries listed in Warren's email provide a good interface for working with or accessing non-linear (or slices of linear) structures in hdf5? What is the functionality that Tim's hdf5 wrapper provides for us? Is this functionality also provided in the Python hdf5 tools? Who is writing those tools/libraries? Are they supported by a large organization, which is likely to continue to fund development, or are the libraries written by someone as part of their graduate work (i.e. they aren't currently being maintained). Do these libraries provide functionality that is beyond Tim's wrapper? If so, what would this enable us to do that we can't do now?