samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

Remove usages of FakeReferenceSequenceFile #1342

Closed cmnbroad closed 4 years ago

cmnbroad commented 5 years ago

Many of the CRAM index tests use references that are generated "on the fly" based on sequence dictionaries that are provided to FakeReferenceSequenceFile. The resulting sequence dictionaries consist of long synthesized strings (in some cases hundreds of megabases) of nothing but "N"s, and are large, slow, and unrealistic. These tests should be rewritten against a more realistic data set.

cmnbroad commented 4 years ago

Closing as obsolete, since there are many tests that depend on this. If you supply it with a sequence dictionary with long reference contigs, it will generated long reference strings that can be slow to process, but it does eliminate the need to actually have the actual references in the repo.