quinlan-lab / STRling

Detect novel (and reference) STR expansions from short-read data
MIT License
60 stars 9 forks source link

STRling warning in [Joint call str loci across all samples] #117

Closed chanhee22kim closed 7 months ago

chanhee22kim commented 8 months ago

Hello,

I run below command for joint calling which binds several bin file

cat ../AD_WGS_batch1-7_STRling.txt | xargs -L 2000 strling merge -f ../../resources/chm13v2.0.fa --output-prefix ~/WGS/AD_STR/outputs/joint_bin/ > ~/WGS/AD_STR/outputs/joint_bin/str_joint_log.txt 2>&1

The command finished with logs below.

More than 65535 reads in cluster with first read:(tid: 3, position: 169960672, repeat: ['G', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,READ2,DUP, split: right, mapping_quality: 47, repeat_count: 70, align_length: 70, qname: "20") skipping
More than 65535 reads in cluster with first read:(tid: 24, position: 10263, repeat: ['G', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,MREVERSE,READ2, split: none, mapping_quality: 43, repeat_count: 150, align_length: 150, qname: "428") skipping
More than 65535 reads in cluster with first read:(tid: 24, position: 15757, repeat: ['G', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,MREVERSE,READ2, split: none, mapping_quality: 60, repeat_count: 150, align_length: 150, qname: "113") skipping
More than 65535 reads in cluster with first read:(tid: 1, position: 181396416, repeat: ['A', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,READ2, split: none_right, mapping_quality: 0, repeat_count: 124, align_length: 150, qname: "49") skipping
More than 65535 reads in cluster with first read:(tid: 3, position: 169960503, repeat: ['A', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,PROPER_PAIR,MREVERSE,READ1, split: none, mapping_quality: 54, repeat_count: 128, align_length: 150, qname: "1138") skipping
More than 65535 reads in cluster with first read:(tid: 22, position: 149830784, repeat: ['G', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,PROPER_PAIR,REVERSE,READ1, split: none, mapping_quality: 60, repeat_count: 142, align_length: 150, qname: "754") skipping
More than 65535 reads in cluster with first read:(tid: 6, position: 86499722, repeat: ['G', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,MREVERSE,READ2, split: none, mapping_quality: 57, repeat_count: 144, align_length: 150, qname: "233") skipping
More than 65535 reads in cluster with first read:(tid: 24, position: 15398, repeat: ['C', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,READ2, split: none, mapping_quality: 60, repeat_count: 150, align_length: 150, qname: "97") skipping
More than 65535 reads in cluster with first read:(tid: 24, position: 15965, repeat: ['C', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,READ2, split: none, mapping_quality: 60, repeat_count: 150, align_length: 150, qname: "71") skipping
More than 65535 reads in cluster with first read:(tid: 6, position: 86500001, repeat: ['C', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,READ2, split: none, mapping_quality: 60, repeat_count: 122, align_length: 150, qname: "1715") skipping

I'm asking if it is okay to continue the skipping warning, or should I check other options to solve this problem.

I used 1,824 samples with strling.

Thank you for providing a great tool.

Best regards, Chan


[log]

strling version: 0.5.2 [strling] read 815645 STR reads from file: WGS_0001.bin [strling] read 501777 STR reads from file: WGS_0002.bin ... [strling] read 102666 STR reads from file: WGS_1836.bin [strling] read 113928 STR reads from file: WGS_1837.bin [strling] read 123117 STR reads from file: WGS_1838.bin More than 65535 reads in cluster with first read:(tid: 3, position: 169960672, repeat: ['G', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,READ2,DUP, split: right, mapping_quality: 47, repeat_count: 70, align_length: 70, qname: "20") skipping More than 65535 reads in cluster with first read:(tid: 24, position: 10263, repeat: ['G', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,MREVERSE,READ2, split: none, mapping_quality: 43, repeat_count: 150, align_length: 150, qname: "428") skipping More than 65535 reads in cluster with first read:(tid: 24, position: 15757, repeat: ['G', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,MREVERSE,READ2, split: none, mapping_quality: 60, repeat_count: 150, align_length: 150, qname: "113") skipping More than 65535 reads in cluster with first read:(tid: 1, position: 181396416, repeat: ['A', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,READ2, split: none_right, mapping_quality: 0, repeat_count: 124, align_length: 150, qname: "49") skipping More than 65535 reads in cluster with first read:(tid: 3, position: 169960503, repeat: ['A', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,PROPER_PAIR,MREVERSE,READ1, split: none, mapping_quality: 54, repeat_count: 128, align_length: 150, qname: "1138") skipping More than 65535 reads in cluster with first read:(tid: 22, position: 149830784, repeat: ['G', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,PROPER_PAIR,REVERSE,READ1, split: none, mapping_quality: 60, repeat_count: 142, align_length: 150, qname: "754") skipping More than 65535 reads in cluster with first read:(tid: 6, position: 86499722, repeat: ['G', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,MREVERSE,READ2, split: none, mapping_quality: 57, repeat_count: 144, align_length: 150, qname: "233") skipping More than 65535 reads in cluster with first read:(tid: 24, position: 15398, repeat: ['C', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,READ2, split: none, mapping_quality: 60, repeat_count: 150, align_length: 150, qname: "97") skipping More than 65535 reads in cluster with first read:(tid: 24, position: 15965, repeat: ['C', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,READ2, split: none, mapping_quality: 60, repeat_count: 150, align_length: 150, qname: "71") skipping More than 65535 reads in cluster with first read:(tid: 6, position: 86500001, repeat: ['C', '\x00', '\x00', '\x00', '\x00', '\x00'], flag: PAIRED,READ2, split: none, mapping_quality: 60, repeat_count: 122, align_length: 150, qname: "1715") skipping

hdashnow commented 7 months ago

Hi @chanhee22kim,

tldr; Yes, it's safe to ignore these warnings unless your sequencing data is VERY deep. This is to reduce memory requirements.

STRling is set to skip over tandem repeat regions supported by more than 65,535 reads (uint16.high.int). This prevents a memory blowout. Unless you are doing incredibly high-depth sequencing most of the genome should be well under this threshold, but we do expect to see some of these warnings in most samples. It is common for a number of regions to have very high depth, for example, seg dups, transposable elements, telomeres and centromeres. Typically, any variant call in an extremely high-depth region such as this is suspect. So STRling doesn't try to make a call in these regions, and it should be safe to skip over them for most applications.

chanhee22kim commented 7 months ago

I understood what you mentioned. Thank you for your kind response.

Thank you very much!