quinlan-lab / STRling

Detect novel (and reference) STR expansions from short-read data
MIT License
60 stars 9 forks source link

Assertion error for reads without sequence #103

Open christopher-schroeder opened 2 years ago

christopher-schroeder commented 2 years ago

I get the following assertion error:

strling version: 0.5.0
[strling] using existing file resources/genome.dna.homo_sapiens.GRCh38.100.fasta.str for genome repeats
[strling] got STR repeats from genome into an interval tree
[strling] collecting str-like reads
[strling] extracting chromosome:1
[strling] extracting chromosome:10
[strling] extracting chromosome:11
[strling] extracting chromosome:12
[strling] extracting chromosome:13
[strling] extracting chromosome:14
[strling] extracting chromosome:15
[strling] extracting chromosome:16
[strling] extracting chromosome:17
[strling] extracting chromosome:18
[strling] extracting chromosome:19
[strling] extracting chromosome:2
[strling] extracting chromosome:20
[strling] extracting chromosome:21
[strling] extracting chromosome:22
[strling] extracting chromosome:3
[strling] extracting chromosome:4
[strling] extracting chromosome:5
[strling] extracting chromosome:6
[strling] extracting chromosome:7
[strling] extracting chromosome:8
[strling] extracting chromosome:9
[strling] extracting chromosome:X
[strling] extracting chromosome:Y
/opt/conda/conda-bld/strling_1622157642620/work/src/strling.nim(44) strling
/opt/conda/conda-bld/strling_1622157642620/work/src/strling.nim(41) main
/opt/conda/conda-bld/strling_1622157642620/work/src/strpkg/extract.nim(319) extract_main
/opt/conda/conda-bld/strling_1622157642620/work/src/strpkg/extract.nim(200) add
/opt/conda/conda-bld/strling_1622157642620/work/src/strpkg/extract.nim(67) to_tread
/opt/conda/conda-bld/strling_1622157642620/_build_env/nim/lib/system/assertions.nim(30) failedAssertImpl
/opt/conda/conda-bld/strling_1622157642620/_build_env/nim/lib/system/assertions.nim(23) raiseAssert
/opt/conda/conda-bld/strling_1622157642620/_build_env/nim/lib/system/fatal.nim(49) sysFatal
Error: unhandled exception: /opt/conda/conda-bld/strling_1622157642620/work/src/strpkg/extract.nim(67, 12) `align_length > 0` K00276:107:HHYWGBBXX:8:1125:32309:38451   141     *       0       0       *
       *       0       0       *       *       AS:i:0  XS:i:0  RG:Z:LUEB0077G [AssertionDefect]

This is probably due to the * in the sequence and quality field. By specification these are allowed, for example when the sequence is fully trimmed by the adapter trimming step. Even if the read itself is useless (also because it is unmapped), it is still useful to have them in the alignment file to remain a complete paired end file.

christopher-schroeder commented 2 years ago

I have the same problem :-/

hdashnow commented 2 years ago

Does this work on your data? https://github.com/quinlan-lab/STRling/tree/zero-len

christopher-schroeder commented 2 years ago

Yes

hdashnow commented 2 years ago

@brentp mentioned memory concerns. Any chance you could check the memory usage with the two different versions (your PR, vs. the fix above)?

christopher-schroeder commented 2 years ago

I am a bit busy at the moment and to be honest I dont see the point in this test. There are only a couple of read pairs for this within a few millions. I dont think I could detect any memory leak. But even if, there are 2 possible outcomes: either I detect something, then you have to look at the code. Or I can't detect anything, in that you should still check the code for something fishy!