Closed johannesgeibel closed 2 years ago
Hi @johannesgeibel, I think you are probably correct, the error looks like it is caused by of those short reference contigs. I have modified the function so it skips the re-mapping step for short reference contigs < 1kb::
if ref_genome.get_reference_length(e.chrA) < 1000 or ref_genome.get_reference_length(e.chrB) < 1000:
....
continue
Is this a satisfactory fix for you? I guess the remapping could also be tried for short reference contigs also, but its unclear how robust it would be for very short reference contigs anyway. I have uploaded the current fix, but you will have to build from source whilst I get the patch uploaded to pypi.
Hi @kcleal, thanks. The fix seems to solve the issue. Excluding the very small contigs should not cause any larger trouble, as they are anyway disregarded in most cases. It's just common practice to keep them in the mapping process to reduce mismappings.
Hi, I am currently testing dysgu on some chicken samples. For some samples, sequenced with 2*151bp paired-end Illumina reads, I encounter an error if
--remap=True
:The chicken reference genome actually has some small contigs < 500bp, but I'm not sure whether
ref_seq_clipped
holds the reference contig. Further, I would expect the error then for all samples when force-calling, but it appears only in 4 out of 6 test samples. Do you have any clue whether this could cause the problem? Thanks, Johannes