Open shenwei356 opened 10 months ago
The current tab-delimited search result format is redundant and inefficient for parsing in
kmcp profile
. So we can use a compact binary format to save the temporary result.
kmcp search
: a flag-b/--binary-output
would be added to choose the output format optionally.
Would it not be better to infer from the output extension which is usually specified? Make it a .kmcp file or something similar.
|#query|qLen|qKmers|FPR |hits|target |chunkIdx|chunks|tLen |kSize|mKmers|qCov |tCov |jacc |queryIdx|
|:-----|:---|:-----|:---------|:---|:--------------|:-------|:-----|:-------|:----|:-----|:-----|:-----|:-----|:-------|
|read_1|150 |130 |7.4626e-15|1 |GCF_000007805.1|2 |10 |6397126 |21 |130 |1.0000|0.0002|0.0002|0 |
|read_2|150 |130 |7.4626e-15|1 |GCF_000007805.1|8 |10 |6397126 |21 |130 |1.0000|0.0002|0.0002|1 |
|read_3|150 |130 |7.4626e-15|1 |GCF_000003835.1|8 |10 |12115052|21 |130 |1.0000|0.0001|0.0001|2 |
|read_4|150 |130 |7.4626e-15|1 |GCF_000003835.1|3 |10 |12115052|21 |130 |1.0000|0.0001|0.0001|3 |
Empirically, few of these fields would require an int64 (at least none were close to int32 in a practical file) so that could also be potential space saving
Edit: meant that int32 would probably be enough rather than int64
Would it not be better to infer from the output extension which is usually specified? Make it a .kmcp file or something similar.
Yes, we can make the binary format the default output, and make the plain text format optional.
Empirically, few of these fields would require an int32 (at least none were close in a practical file) so that could also be potential space saving
Right. I'll carefully consider it later. Thank you.
The current tab-delimited search result format is redundant and inefficient for parsing in
kmcp profile
. So we can use a compact binary format to save the temporary result.kmcp search
: a flag-b/--binary-outpu
would be added to choose the output format optionally.kmcp view
should be added to convert the binary to plain text format.kmcp merge
needs to be compatible with both plain and binary formats.kmcp profile
needs to be compatible with both plain and binary formats.