telegraphic / hickle

a HDF5-based python pickle replacement
http://telegraphic.github.io/hickle/
Other
485 stars 70 forks source link

Submit to the Journal of Open-source software #73

Closed telegraphic closed 5 years ago

telegraphic commented 6 years ago

Hi @Arctice @mmckerns @craffel @ellliottt @ebenolson @byronyi and @eendebakpt

I'd like to submit hickle to the Journal of Open-source Software, http://joss.theoj.org/about. As you've all submitted code to hickle, I would like to extend an invitation to you all to be listed as authors.

If you're keen, please send me an email at dancpr [at] berkeley [dot] edu with:

Thanks for your contributions!

telegraphic commented 6 years ago

Quick update: just waiting on responses on a few authors and adding @femtotrader

telegraphic commented 5 years ago

Apologies for how long this took, here's a draft below. Let me know in the next 7 days or so any modifications you'd like to make. Note that I haven't yet added in references, if there's anything in particular you'd like added (particularly in the research projects list), please let me know. Also, if you have an ORCID, let me know too and I'll add it in.

Cheers! Danny

---
title: 'Hickle: A HDF5-based python pickle replacement'
tags:
  - Python
  - astronomy
authors:
  - name: Danny C. Price
    orcid: 0000-0003-2783-1608
    affiliation: "1, 2" # (Multiple affiliations must be quoted)
  - name: Sébastien Celles
    orcid: 0000-0001-9987-4338
    affiliation: 3
  - name: Pieter T. Eendebak
    orcid: 0000-0001-7018-1124
    affiliation: "4, 5"
  - Michael M. McKerns
    orcid: 0000-0001-8342-3778
    affiliation: 6
  - name: Eben M. Olson
    affiliation: 7
  - name: Colin Raffel
    affiliation: 8
  - name: Bairen Yi
    affiliation: 9

affiliations:
  - name: Department of Astronomy,  University of California Berkeley, Berkeley CA 94720
    index: 1
  - name: Centre for Astrophysics & Supercomputing, Swinburne University of Technology, Hawthorn, VIC 3122, Australia
    index: 2
  - name: Thermal Science and Energy Department, Institut Universitaire de Technologie de Poitiers - Université de Poitiers, France
    index: 3
  - name: QuTech, Delft University of Technology, P.O. Box 5046, 2600 GA Delft, The Netherlands
    index: 4
  - name: Netherlands Organisation for Applied Scientific Research (TNO), P.O. Box 155, 2600 AD Delft, The Netherlands
    index: 5
  - name: Institute for Advanced Computational Science, Stony Brook University, Stony Brook, NY 11794-5250
    index: 6
  - name: Department of Laboratory Medicine, Yale University, New Haven CT 06510 USA
    index: 7
  - name: Google Brain, Mountain View, CA, 94043
    index: 8
  - name: The Hong Kong University of Science and Technology 
    index: 9

date: 10 November 2018
bibliography: paper.bib
---

hickle is a Python 2/3 package for quickly dumping and loading python data structures to Hierarchical Data Format 5 (HDF5) files [@hdf5]. When dumping to HDF5, hickle automatically convert Python data structures (e.g. lists, dictionaries, numpy arrays [@numpy]) into HDF5 groups and datasets. When loading from file, hickle automatically converts data back into its original data type. A key motivation for hickle is to provide high-performance loading and storage of scientific data in the widely-supported HDF5 format.

hickle is designed as a drop-in replacement for the Python pickle package, which converts Python object hierarchies to and from Python-specific byte streams (processes known as 'pickling' and 'unpickling' respectively). Several different protocols exist, and files are not designed to be compatible between Python versions, nor interpretable in other languages. In contrast, hickle stores and loads files from HDF5, for which application programming interfaces (APIs) exist in most major languages, including C, Java, R, and MATLAB.

Python data structures are mapped into the HDF5 abstract data model in a logical fashion, using the h5py package [@colette:2014]. Metadata required to reconstruct the hierarchy of objects, and to allow conversion into Python objects, is stored in HDF5 attributes. Most commonly used Python iterables (dict, tuple, list, set), and data types (int, float, str) are supported, as are numpy N-dimensional arrays. Commonly-used astropy data structures and scipy sparse matrices are also supported.

hickle has been used in many scientific research projects, including:

hickle is released under the MIT license.

byronyi commented 5 years ago

LGTM

eendebakpt commented 5 years ago

@telegraphic My orcid number is https://orcid.org/0000-0001-7018-1124

Looks fine to me.

craffel commented 5 years ago

LGTM, thanks!

telegraphic commented 5 years ago

Bibtex entries:

@article{astropy:2018,                  
    Adsurl = {https://ui.adsabs.harvard.edu/#abs/2018AJ....156..123T},          
    Author = {{Price-Whelan}, A.~M. and {Sip{'{o}}cz}, B.~M. and {G{"u}nther}, H.~M. and {Lim}, P.~L. and others},     
    Doi = {10.3847/1538-3881/aabc4f},
    Eid = {123},
    Journal = {aj},
    Pages = {123},
    Title = {{The Astropy Project: Building an Open-science Project and Status of the v2.0 Core Package}},
    Volume = {156},
    Year = 2018}

@book{collette:2014,
    Author = {Andrew Collette},
    Keywords = {python, hdf5},
    Publisher = {O'Reilly},
    Title = {Python and HDF5},
    Year = {2013}}

@article{Durant:2017,
    Author = {Durant, Thomas J.S. and Olson, Eben M. and Schulz, Wade L. and Torres, Richard},
    Doi = {10.1373/clinchem.2017.276345},
    Eprint = {http://clinchem.aaccjnls.org/content/63/12/1847.full.pdf},
    Issn = {0009-9147},
    Journal = {Clinical Chemistry},
    Number = {12},
    Pages = {1847--1855},
    Publisher = {Clinical Chemistry},
    Title = {Very Deep Convolutional Neural Networks for Morphologic Classification of Erythrocytes},
    Url = {http://clinchem.aaccjnls.org/content/63/12/1847},
    Volume = {63},
    Year = {2017},
}

@webpage{hdf5,
    Lastchecked = {November 2018},
    Url = {https://support.hdfgroup.org/HDF5/doc/index.html}}

@article{numpy,
    Author = {T. E. Oliphant},
    Doi = {10.1109/MCSE.2007.58},
    Issn = {1521-9615},
    Journal = {Computing in Science Engineering},
    Month = {May},
    Number = {3},
    Pages = {10-20},
    Title = {Python for Scientific Computing},
    Volume = {9},
    Year = {2007}}

@article{Price:2018,
    Adsnote = {Provided by the SAO/NASA Astrophysics Data System},
    Adsurl = {https://ui.adsabs.harvard.edu/#abs/2018MNRAS.478.4193P},
    Author = {{Price}, D.~C. and {Greenhill}, L.~J. and {Fialkov}, A. and {Bernardi}, G. and others},
    Doi = {10.1093/mnras/sty1244},
    Journal = {Monthly Notices of the Royal Astronomy Society},
    Pages = {4193-4213},
    Title = {{Design and characterization of the Large-aperture Experiment to Detect the Dark Age (LEDA) radiometer systems}},
    Volume = {478},
    Year = 2018,
    Bdsk-Url-1 = {https://doi.org/10.1093/mnras/sty1244}}

@phdthesis{Raffel:2016,
    Author = {Colin Raffel},
    School = {Columbia University},
    Title = {Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching},
    Year = {2016}}

@inproceedings{Zhang:2016,
    Acmid = {2934880},
    Address = {New York, NY, USA},
    Author = {Zhang, Hong and Chen, Li and Yi, Bairen and Chen, Kai and Chowdhury, Mosharaf and Geng, Yanhui},
    Booktitle = {Proceedings of the 2016 ACM SIGCOMM Conference},
    Doi = {10.1145/2934872.2934880},
    Isbn = {978-1-4503-4193-6},
    Keywords = {Coflow;, data-intensive applications;, datacenter networks},
    Location = {Florianopolis, Brazil},
    Numpages = {14},
    Pages = {160--173},
    Publisher = {ACM},
    Series = {SIGCOMM '16},
    Title = {CODA: Toward Automatically Identifying and Scheduling Coflows in the Dark},
    Url = {http://doi.acm.org/10.1145/2934872.2934880},
    Year = {2016}}
telegraphic commented 5 years ago

Thanks all -- I trawled for some some refs for the use cases, please let me know if these are inaccurate!

telegraphic commented 5 years ago

I'm pleased to announce this has been accepted http://joss.theoj.org/papers/0c6638f84a1a574913ed7c6dd1051847

telegraphic commented 5 years ago

Thanks all!

eendebakpt commented 5 years ago

Nice. Thanks for the work Danny!

scls19fr commented 5 years ago

Nice work! Thanks @telegraphic !

mmckerns commented 5 years ago

Awesome. Congrats Danny.