samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

Call vcfreader.query in in parallel having strange unpredicted behaviors. #1336

Closed wavefancy closed 5 years ago

wavefancy commented 5 years ago

Hi Developers,

It seems vcfreader.query cannot be used in parallel. Below two ways will cause strange unpredicted behaviors. Any ideas on how to fix it in a parallel calling.

 vcfreader.query(chr, min_pos, max_pos).stream()
           .parallel()
           .forEach()

OR

.parallel()
.forEach(group->{
     vcfreader.query(chr, min_pos, max_pos).stream()
           .parallel()
           .forEach();
});

Best regards Wallace

cmnbroad commented 5 years ago

@wavefancy The iterators returned from VCF queries are backed by an underlying input stream, and generally maintain state that precludes being able to split them for use in parallel.