samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
278 stars 244 forks source link

Race conditions accessing CRAM's reference data through ReferenceSource #1643

Open vruano opened 1 year ago

vruano commented 1 year ago

Description of the issue:

Problem spotted in GATK due to smelly use and no use of synchronized in CRAM ReferenceSource: https://github.com/broadinstitute/gatk/issues/8139

.../cram/.../ReferenceSource.java has a mixture of synchronized and non-synchronzed access to its cache member fields that results in a race condition when accessing it in parallel by at least one tool in GATK.

This class already contain synchronized modifier which implies that is supposed to support multi-thread access.

Your environment:

Steps to reproduce

See original post. No concrete test input data provided but the problem becomes clear when inspecting the code.

Expected behaviour

Either (a) this class is made multithread safe or (b) it clearly stated that is not multi-thread safe and the multithread elements (e.g. synchronized modifiers) are removed.

Actual behaviour

At least there is one way to cause race conditions in multi-thread use of this class. There could be more, so code should be review for other possible issues.