ropensci / unconf14

Repo to brainstorm ideas (unconference style) for the rOpenSci hackathon.
28 stars 3 forks source link

HDF5 library on Ubuntu (inspired by Million Song Dataset) #24

Open alyssafrazee opened 10 years ago

alyssafrazee commented 10 years ago

The Million Song Dataset project had trouble writing R wrapper functions for accessing the data, which is stored in HDF5 format. Their note is here. Thought I'd throw it out there as a potential fun project, if anybody's interested!

karthik commented 10 years ago

@alyssafrazee Is the HDF5 format used elsewhere (like other types of scientific data)? I'm curious

cboettig commented 10 years ago

@karthik I believe the latest NetCDF is built on HDF5 format. I think HDF5 was developed by one of the DOE labs to begin with...

emhart commented 10 years ago

We use it extensively at NEON. It's really common for hyperspectral data and biogeochemical data. There's already a number of packages that open it ( http://stackoverflow.com/questions/15974643/how-to-deal-with-hdf5-files-in-r). That said, it's used to compress quite large data sets that aren't usually served up via API.

On Tue, Mar 25, 2014 at 4:52 PM, Carl Boettiger notifications@github.comwrote:

@karthik https://github.com/karthik I believe the latest NetCDF is built on HDF5 format. I think HDF5 was developed by one of the DOE labs to begin with...

Reply to this email directly or view it on GitHubhttps://github.com/ropensci/hackathon/issues/24#issuecomment-38631442 .

Edmund M. Hart, PhD Staff Scientist - Ecoinformatics National Ecological Observatory Network @distribecology http://emhart.info http://emhart.github.com/

sckott commented 10 years ago

Given what I've heard at the Ecological Forecasting workshop I'm at now, seems like netcdf/HDF5 may be a good standard file type for us to output data to since its used widely, so can be used downstream in many applications. Maybe only where appropriate like for rnoaa, spocc, etc.