Currently we shuffle on chunk_ID, which makes read mappings different for different number of threads, or if reads are occurring in different chunk.
IIRC, BWA-MEM gets the pseudo random placement from the read name. Is it possible to do this instead of on chunk ID, without noticeable computational overhead? I don't think it's worth implementing if the code becomes complex or if it increases runtime.
I noticed this when running an experiment b/t symmetric and asymmetric seeds with reads simulated from either chr X or Y and mapping to only chr X and chr Y from CHM13.
When using asymmetric seeds (2*hash_s1 - hash_s2), the below read aligns to position 29094803 on chrY when aligned as the only read, but to position 29091249 on chrY when alignd as part of a file of 100k reads (using -t 2). In both cases it has CIGAR 114=1X161=1X3=1X42=1X8=1X64=1X30=1X24=1X16=1X8=1X3=1X1=1X14= and alignment score 900. The full simulated file is too large to attach here, I can provide it elsewhere if needed.
Hi @marcelm (CC @Itolstoganov)
Currently we shuffle on chunk_ID, which makes read mappings different for different number of threads, or if reads are occurring in different chunk.
IIRC, BWA-MEM gets the pseudo random placement from the read name. Is it possible to do this instead of on chunk ID, without noticeable computational overhead? I don't think it's worth implementing if the code becomes complex or if it increases runtime.
I noticed this when running an experiment b/t symmetric and asymmetric seeds with reads simulated from either chr X or Y and mapping to only chr X and chr Y from CHM13.
When using asymmetric seeds (
2*hash_s1 - hash_s2
), the below read aligns to position29094803
on chrY when aligned as the only read, but to position29091249
on chrY when alignd as part of a file of 100k reads (using-t 2
). In both cases it has CIGAR114=1X161=1X3=1X42=1X8=1X64=1X30=1X24=1X16=1X8=1X3=1X1=1X14=
and alignment score 900. The full simulated file is too large to attach here, I can provide it elsewhere if needed.Btw, for symmetric seeds (as is currently used) the read aligns with alignment score 1000 and
223=1X38=1X237=
to position44832808
on chr Y.