sid-krish / rhometa

metagenome population recombination rate estimation pipeline
MIT License
12 stars 0 forks source link

Example implementation of HDF5 #29

Closed cerebis closed 2 years ago

cerebis commented 2 years ago

Resolves #29

Note, I do not expect this to be taken up as a pull-request. I am just notifying you through github.

This example is only a beginning. It does not hook up the downstream consumers of lookup tables, only allows you to generate a store from multiple generation "runs".

It has not been tested through a workflow run as yet due to problems with linking openblas in MacOS,however it should be very close to running.

Conceptually, the tool will make a "table set" in the store for a given run of downsampling. This is done after the dataframes are written to disk as this approach is a much easier way of support Nextflow driven concurrency.

Within the H5 store, each "run" is stored in a catalog dataframe. This table lists the parameters used in generating the table set. Each depth-based dataframe is then stored individually, where the key is defined by the run's index in the catalog and the depth.

Eg. For the run with index 1, each depth table is indexed as follows:

key = "/run_1/depth_99"