molbiodiv / biojs-io-biom

Parses biom files
MIT License
2 stars 1 forks source link

Efficient representation of sparse data #17

Closed iimog closed 7 years ago

iimog commented 8 years ago

As referees of our f1000 article note:

The in memory representation of the data following parse by BioJS are either in a dense matrix, or in a dict of keys style sparse representation. As the authors note, specialized methods will need to be created to handle large data efficiently, however the authors may wish to consider placing emphasis instead on specialized data structures such as compressed sparse row or column.

McDonald D and Bolyen E. Referee Report For: biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format [version 1; referees: 1 approved with reservations]. F1000Research 2016, 5:2348 (doi: 10.5256/f1000research.10362.r16546)

This is a very good point. Right now we only use the original sparse or dense representation as it is defined for the biom version 1.0 json. But depending on the input data a lot of memory can be saved by using specialized data structures to internally store the biom object on parse. It can then be transformed back to the json representation when write is called.

iimog commented 7 years ago

Dedicated issue #35. Planned for future.