Open droazen opened 8 years ago
For @cmnbroad
Also, when this is finished we should undo https://github.com/samtools/htsjdk/pull/591.
@droazen to be clear - we only need to be able to read BCF2.2 records created by htslib. I don't think we need to be able to write BCF2.2 for our usecase. Is that right?
@akiezun I believe so, yes (though we should confirm with the TileDB guys). In any event, the BCF2Codec
is only capable of reading, so writing is not covered by this ticket.
For what it is worth, our use case is to read BCF2.2 records created by htslib with htsjdk through Hadoop-BAM. Thanks for looking into this!
Is there any sense of when this work might be completed? We have a similar requirement.
We really hope to be able to assign an engineer to work on this this quarter, but can't make any firm promises at this time. The work has been started (see https://github.com/samtools/htsjdk/pull/694 and https://github.com/cmnbroad/htsjdk/tree/cn_bcf2), but it's run into snags related to the fact that we need to maintain backwards compatibility for older versions of the VCF/BCF specs, but the htsjdk parsing code is unfortunately not well decomposed by version. A significant refactoring is needed to properly isolate the parsers for different versions from each other (and do an equivalent task on the writing end).
@droazen thanks for the quick response! Is that branch functional for BCF2.2 support if we don't need compatibility with earlier formats?
@chriswhelix That branch is a work in progress that definitely shouldn't be used for anything except testing purposes -- @cmnbroad can provide more details on its current status.
Its been a while since I've looked at it, but my recollection is that support for reading was mostly there, with the exception of one remaining BCF2.2. feature (end-of-vector marker ?). There is no write support at all. Anyway, its not finished; its pretty far behind master, and its certainly not tested.
Thanks @cmnbroad. Really appreciate the responsiveness on this.
After an only mildly hellish tour through JNAerator, Bridj, and undocumented C code, I managed to get bindings to htslib working as a short term solution. Would definitely prefer to use htsjdk once it's updated.
Was additional development done to support BCF2.2?
@agostof BCF2.2 is still not supported.
BCF2Codec
has not been well-maintained over the years, and does not fully support the latest BCF 2.2 spec (see the BCF section in http://samtools.github.io/hts-specs/VCFv4.3.pdf). We now have at least one htsjdk client (Intel) that wants to use the htsjdk BCF codec for performance reasons to ingest htslib output (which does support BCF 2.2), and even if we didn't it's worth bringing the codec up-to-date rather than continuing to distribute htsjdk with out-of-date BCF support.