optional: data_id (i.e., taxon_id), downstream_version (string; warn if missing)
schema: dstream_T_bitoffset, dstream_T_bitsizedstream_storage_bitoffset, dstream_storage_bitsize, (all measured in bits; use assert to enforce that these are even bytes)
dstream_S, dstream_algo (string)
rows: one per taxon
should have a dstream tool for this (dstream.extract) dstream_long_format
how to forward data???
dstream responsibility:
dstream.dataframe.data_unpack_packed
dstream.dataframe.lookup_explode_unpacked
dstream.dataframe.lookup_explode_packed
dstream raw buffer format (i.e., genome dumps):
data_hex
(hex)data_id
(i.e.,taxon_id
),downstream_version
(string; warn if missing)dstream_T_bitoffset
,dstream_T_bitsize
dstream_storage_bitoffset
,dstream_storage_bitsize
, (all measured in bits; use assert to enforce that these are even bytes)dstream_S
,dstream_algo
(string)dstream
tool for this (dstream.extract
)dstream_long_format
dstream parsed buffer format:
dstream_S
,dstream_T
,dstream_storage_hex
(hex string),dstream_algo
(string)data_id
(i.e.,taxon_id
),downstream_version
(string; warn if missing)dstream long format:
hstrat_version
,dstream_k
,dstream_Tbar
,dstream_T
,dstream_value_bitsize
dstream_value_hex
data_id
(i.e.,taxon_id
),downstream_version
(string; warn if missing)pipeline input:
hsurf long format:
hstrat_version
,taxon_id
,num_strata_deposited
,rank
,differentia
,differentia_bit_width
data_id
(i.e.,taxon_id
),hstrat_version
(string; warn if missing)origin_time
in-memory representation to pass to C++ bindings:
rank
,differentia
) and a vector (taxon_id
)pipeline output:
taxon_id
,origin_time
,ancestor_id
,ancestor_ids
,differentia_bit_width
PIPELINE: