spacetelescope / style-guides

An opinionated guide on how we work.
Creative Commons Attribution 4.0 International
55 stars 33 forks source link

Linked data directories #3

Closed drlaw1558 closed 6 years ago

drlaw1558 commented 6 years ago

Another issue that would be useful to have a style guide for: linked data directories.

This is particularly relevant to notebooks, which might operate on a data file that they need to find in order for an end-user to run the notebook successfully. I can see two kinds of files: 1) Small data files that make sense to live within the repository itself, and can easily be linked with an environmental variable 2) Large data files that shouldn't be in the repository but staged elsewhere.

Being relatively new to git and python (from svn and IDL) I've been inventing my own approaches using environmental variables pointing to (1) the local repo checkout directory, and (2) a corresponding data directory on central store. Works well enough, but if there is a more elegant solution it would be helpful to describe here.

sosey commented 6 years ago

Elegant solutions that work for everyone may be a tough one on this. FWIW, I also use environment variables to point to necessary notebook data, especially since I'm often dealing with examples using fits files, which you don't want stored in github.

When available one could also provide the link back to the archive holding or download commands depending on where the data live.

arfon commented 6 years ago

Yeah, this is a hard usability problem. One solution for this I've seen used in the past is Git-LFS which LSST are using for some of their repos.

It would be good to develop some recommended solutions to this.

hcferguson commented 6 years ago

Dataversioncontrol (https://dataversioncontrol.com) looks interesting in this regard.

arfon commented 6 years ago

Dataversioncontrol (https://dataversioncontrol.com) looks interesting in this regard.

Agreed. Looks very similar to Git-LFS in approach.

ivastar commented 6 years ago

@eteq wrote this up, seems generally applicable in this context: https://innerspace.stsci.edu/pages/viewpage.action?pageId=129671315