Closed distilledchild closed 1 year ago
Hi @theshowmustgolangon, Thanks for reporting this. It looks like there is some problem interpreting the SA tag. What aligner did you use? If you are able to share a few example reads with SA tags, that would be very helpful for debugging
@kcleal Thank you for your support and help! The aligner I used is longranger (linked-read specific aligner) and a few lines are here.
A00735:93:HLHKJDSXX:1:1366:10312:23797 321 chr1 5 0 64M64H chr17 63605199 0 CAATCAAACACAGCATCCTTTTCAACAGAAGCAGAAGCTCATCTGAATATGCTCAAGGATGCTG FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF RX:Z:TCTGTCGCAGAAGCAC QX:Z:FFFFFFFFFFFFFFFF TR:Z:CGGACCA TQ:Z:FFFFFFF XS:i:-73 AS:i:-41 XM:A:0 AM:A:0 XT:i:0 SA:Z:chr17,63605286,-,54S74M,2,0; BX:Z:TCTGTCGCAGAAGCAC-1 RG:Z:SHR_OlaIpcv:LibraryNotSpecified:1:unknown_fc:0 OM:i:0 A00735:93:HLHKJDSXX:3:2517:6723:1579 353 chr1 5 0 64M64H chr11 3713885 0 CAATCAAACACAGCATCCTTTTCAACAGAAGCAGAAGCTCATCTGAATATGCTCAAGGATGCTG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:,F RX:Z:TCTGTCGCAGAAGCAC QX:Z:FFFFFFFFFFFFFFFF TR:Z:CGGACCA TQ:Z:FFFFFFF XS:i:-75 AS:i:-41 XM:A:0 AM:A:0 XT:i:0 SA:Z:chr11,3713873,+,54S74M,1,1; BX:Z:TCTGTCGCAGAAGCAC-1 RG:Z:SHR_OlaIpcv:LibraryNotSpecified:1:unknown_fc:0 OM:i:0 A00735:93:HLHKJDSXX:3:2162:1226:36839 385 chr1 5 0 64M85H chr10 90727447 0 CAATCAAACACAGCATCCTTTTCAACAGAAGCAGAAGCTCATCTGAATATGCTCAAGGATGCTG FFFFFF,:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF RX:Z:CCGTTACAGGTAGTCG QX:Z:F:FFFFFFFFFFFFFF XS:i:-83 AS:i:-54 XM:A:0 AM:A:0 XT:i:0 SA:Z:chr10,90727562,-,54S95M,0,0; BX:Z:CCGTTACAGGTAGTCG-1 RG:Z:SHR_OlaIpcv:LibraryNotSpecified:1:unknown_fc:0 OM:i:0 A00735:93:HLHKJDSXX:4:2644:6958:19930 321 chr1 5 0 65M63H chr16 32793676 0 CAATCAAACACAGCATCCTTTTCAACAGAAGCAGAAGCTCATCTGAATATGCTCAAGGATGCTGA FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF, RX:Z:TCAGGTAGTCCACTCT QX:Z:FFFFFFFFFFFFFFFF TR:Z:CGGACCA TQ:Z:FFFFFFF XS:i:-77 AS:i:-40 XM:A:0 AM:A:0 XT:i:0 SA:Z:chr16,32793830,-,54S63M1D11M,13,2; BX:Z:TCAGGTAGTCCACTCT-1 RG:Z:SHR_OlaIpcv:LibraryNotSpecified:1:unknown_fc:0 OM:i:0 A00735:93:HLHKJDSXX:2:1435:2817:4554 417 chr1 5 0 65M86H chr8 51900376 0 CAATCAAACACAGCATCCTTTTCAACAGAAGCAGAAGCTCATCTGAATATGCTCAAGGATGCTGA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF RX:Z:CTGCGGACAGGGAAGG QX:Z:FFFFFFFFFFFFFFFF XS:i:-94 AS:i:-52 XM:A:0 AM:A:0 XT:i:0 SA:Z:chr8,51900223,+,74S77M,0,0; BX:Z:CTGCGGACAGGGAAGG-1 RG:Z:SHR_OlaIpcv:LibraryNotSpecified:1:unknown_fc:0 OM:i:0
I think the SA tags looks fine. Im a bit confused by the error at the moment. Does this error pop up only a handful of times, or does it seem like every read with an SA tag?
@kcleal not every read, but some reads I think. It stops in the middle of running due to memory issue(out of memory). I am running it again now, and let you know.
@kcleal Hi, after using a plenty of computational resources, I got this error and could you take a look at it please?
Traceback (most recent call last):
File "dysgu/graph.pyx", line 754, in dysgu.graph.alignments_from_sa_tag
TypeError: '<' not supported between instances of 'bool' and 'str'
[E::bgzf_read_block] Failed to read BGZF header at offset 61488496640
[E::bgzf_read] Read block operation failed with error 6 after 0 of 4 bytes
OSError: [Errno 5] Input/output error
Exception ignored in: 'pysam.libcalignmentfile.AlignmentFile.dealloc'
Traceback (most recent call last):
File "/dysgu/lib/python3.10/site-packages/dysgu/main.py", line 280, in run_pipeline
cluster.cluster_reads(ctx.obj)
OSError: [Errno 5] Input/output error
Traceback (most recent call last):
File "/dysgu/bin/dysgu", line 8, in
I think these errors are a bit weird:
[E::bgzf_read_block] Failed to read BGZF header at offset 61488496640
[E::bgzf_read] Read block operation failed with error 6 after 0 of 4 bytes
This indicates a problem reading the alignment file (using pysam). Possibly the bam file is corrupted, that's what the error seems to indicate
@kcleal I will re-install pysam again, and re-run it. Thank you!
You can also validate your bam using picardtools https://gatk.broadinstitute.org/hc/en-us/articles/360036854731-ValidateSamFile-Picard-
@kcleal Thank you. I will try that one now!
You can also validate your bam using picardtools https://gatk.broadinstitute.org/hc/en-us/articles/360036854731-ValidateSamFile-Picard-
I got this error, probably it makes error because of it?
ValidateSamFile Value was put into PairInfoMap more than once. 0: A00735:93:HLHKJDSXX:4:2513:24478:4366 [Mon Dec 05 14:56:05 EST 2022] picard.sam.ValidateSamFile done. Elapsed time: 95.50 minutes.
Its possible. Perhaps this biostars post will help https://www.biostars.org/p/60263/
@kcleal Hi, I took a step to remove duplicates based on here, https://www.biostars.org/p/365882/ and I ran it again. Firstly, The error, I reported first doesn't go away, and just I feel I can ignore them. Second, I found I get an error, core dump like this.
var/spool/slurm/spool/job302969/slurm_script: line 24: 2867650 Bus error (core dumped) dysgu run .......
And I read a few posts related to Bus error, and I changed the value of core to 1, so my command is
dysgu run -p1 --clean \ --mode pe \ --min-support 3 \ --min-size 50 \ --max-cov auto \ --contigs False \ --low-mem \ --exclude ${BASE_DIR}/input/gap_region_rn7chr_ucsc.bed \ -x \ REF, output, input... > vcf
I will let you know how it works, and please tell me if you have any solutions for it.
Thank you for your support.
The command looks fine. Possibly the low-mem option might be causing an issue. Other than that, I am happy to try and degug it for you if you don't mind sending me a small region from your data (as long as I can reproduce the error). A bus error is a bit surprising, I am not really sure what could be causing it
Would it be fine to get a bam file ?! If so, I can give a link to download from a cloud.. please let me know!
Yes, also I will need to know what ref genome. If possible could you send me a subset e.g. chr21 rather than the whole bam
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: a gopher @.> Sent: Thursday, December 8, 2022 5:34:37 PM To: kcleal/dysgu @.> Cc: Kez Cleal @.>; Mention @.> Subject: Re: [kcleal/dysgu] Type error: '<' not supported between instances of 'bool' and 'str' (Issue #51)
External email to Cardiff University - Take care when replying/opening attachments or links. Nid ebost mewnol o Brifysgol Caerdydd yw hwn - Cymerwch ofal wrth ateb/agor atodiadau neu ddolenni.
Would it be fine to get a bam file ?! If so, I can give a link to download from a cloud.. please let me know!
— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkcleal%2Fdysgu%2Fissues%2F51%23issuecomment-1343065727&data=05%7C01%7Cclealk%40cardiff.ac.uk%7C4eb23e1848bd466e795e08dad9427b81%7Cbdb74b3095684856bdbf06759778fcbc%7C1%7C0%7C638061176798467825%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CcTsqIbGj3ROT57%2B5M3KpmJmv91Zv9kj2GBgKjErFvk%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAKIBQHMAODOKLTDUH27G7ZDWMIL23ANCNFSM6AAAAAASTR5AME&data=05%7C01%7Cclealk%40cardiff.ac.uk%7C4eb23e1848bd466e795e08dad9427b81%7Cbdb74b3095684856bdbf06759778fcbc%7C1%7C0%7C638061176798467825%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZQ%2FwmP41STFOH1rxtUTt9LEc1%2FoNrCb%2BC1CSXc9v8t4%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>
@kcleal I extract a bam for chr12 and chr2 and I found that a bam for chr12 causes core dump. If it is ok to run dysgu chromosome by chromosome, I will try that all. And please let me know your email for me to give you a link!
@kcleal Here is the link for download. Could you check it please? I am not sure that I was not able to find core-dump (probabley due to increasing RAM), but you can find the error, the type error. and could you check this log please?
UserWarning: Trying to unpickle estimator LabelEncoder from version 0.23.2 when using version 1.1.3. This might lead to breaking code or invalid results. Use at your own risk.
Additional questions is that can I use results from bams splitted by chromosomes ? I mean, If SVs are not translocations, the rest of SV types could be valid, I think.
Hi @theshowmustgolangon,
Ive fixed the TypeError
bug - it was caused by a read having an identical supplementary alignment to the primary alignment. It looks like this would occur rarely so would be unlikely to affect the output. The UserWarning
can be safely ignored, it is left in there to flag potential issues in the future.
You can build dysgu 1.3.14 from source if you want to run again, or you can wait a few days whilst I get a release to pypi
@kcleal I will compile the source code, and thank you for your supports and helps. I really appreciate it.
@kcleal Hi, I am trying to run with newer version, 1.3.14, but I was not able to do due to core dump error. Resources like CPUs and memory were ~50 and 2T. (Even I ran the tool with 1 CPU based on your comment in this issue board, but it was not successful.) Could you give me an advice please?
Can you post the full output log from running dysgu? I will see if I can help
This is an example of my log. 2022-12-13 14:02:16,095 [INFO ] [dysgu-run] Version: 1.3.14 2022-12-13 14:02:16,095 [INFO ] run -p22 --mode pe --min-support 3 --min-size 50 --merge-within True --drop-gaps True --max-cov auto --low-mem --contigs False --exclude /dysgu/input/gap_region_rn7chr_ucsc.bed -x /refs/rn7_ucsc/rn7chr.fa /dysgu/output/SHR_OlaIpcv /dysgu/input/SHR_OlaIpcv/SHR_OlaIpcv_phased_possorted_bam.nmsorted.fixmate.possorted.dedup.bam 2022-12-13 14:02:16,095 [INFO ] Destination: /dysgu/output/SHR_OlaIpcv 2022-12-13 14:02:16,097 [INFO ] Excluding /dysgu/input/gap_region_rn7chr_ucsc.bed from search 2022-12-13 14:02:16,194 [INFO ] Auto max-cov estimated 294x 2022-12-13 15:08:56,965 [INFO ] dysgu fetch /dysgu/input/SHR_OlaIpcv/SHR_OlaIpcv_phased_possorted_bam.nmsorted.fixmate.possorted.dedup.bam written to /dysgu/output/SHR_OlaIpcv/SHR_OlaIpcv_phased_possorted_bam.nmsorted.fixmate.possorted.dedup.dysgu_reads.bam, n=153333031, time=1:06:40 h:m:s 2022-12-13 15:08:57,003 [INFO ] Input file is: /dysgu/output/SHR_OlaIpcv/SHR_OlaIpcv_phased_possorted_bam.nmsorted.fixmate.possorted.dedup.dysgu_reads.bam 2022-12-13 15:08:57,158 [INFO ] Sample name: SHR_OlaIpcv 2022-12-13 15:08:57,159 [INFO ] Writing vcf to stdout 2022-12-13 15:08:57,159 [INFO ] Running pipeline 2022-12-13 15:08:57,799 [INFO ] Calculating insert size. Removed 799 outliers with insert size >= 1040.0 2022-12-13 15:08:57,811 [INFO ] Inferred read length 148.0, insert median 301, insert stdev 128 2022-12-13 15:08:57,813 [INFO ] Max clustering dist 941 2022-12-13 15:08:57,815 [INFO ] Minimum support 3 2022-12-13 15:08:57,821 [INFO ] Building graph with clustering 941 bp 2022-12-13 16:33:38,105 [INFO ] Total input reads 152447351 2022-12-13 16:34:59,418 [INFO ] Graph constructed /var/spool/slurm/spool/job309515/slurm_script: line 43: 4192427 Bus error (core dumped) dysgu run -p${CPU} --mode pe --min-support 3 --min-size 50 --merge-within True --drop-gaps True --max-cov auto --low-mem --contigs False --exclude ${BASE_DIR}/input/gap_region_rn7chr_ucsc.bed -x ${REF}/rn7chr.fa ${BASE_DIR}/output/${SAMPLE} ${BASE_DIR}/input/${SAMPLE}/${SAMPLE}_phased_possorted_bam.nmsorted.fixmate.possorted.dedup.bam > ${BASE_DIR}/output/${SAMPLE}/${SAMPLE}_dedup_dysgu_sv.1.3.14.vcf
Thanks.
Ive just been looking over the data you sent me. There appears to be a lot of reads with soft-clips - this might be the source of the high memory issue. I can recommend trying adpater trimming if you have not already done so. Also you might be able to bypass the issue by increasing the --clip-length
to e.g. 50.
If you are able to send me the SHR_OlaIpcv_phased_possorted_bam.nmsorted.fixmate.possorted.dedup.dysgu_reads.bam
file, I will be able to investigate further.
@kcleal Thank you for your advice! I am uploading now, but it's pretty big, 74G. Also, I am running it with the option with 50. I will let you know after uploading and running.
Thanks. Just to make sure your uploading the right file - I dont need the original bam file, just the "dysgu_reads" file from the working_directory SHR_OlaIpcv_phased_possorted_bam.nmsorted.fixmate.possorted.dedup.dysgu_reads.bam
I wasn't able to download the file. Ill close this for now, as the issue is probably related to either the number of soft-clipped reads, or the fact they are linked-reads.
Hi, First of all, thank you for the great tool, dysgu! I am running it with my bam file from 10x genomics linked-read sequencing, and I found that error keep popping up. Could you give me some advice please?
2022-12-04 15:31:26,573 [INFO ] Sample name: SHR_OlaIpcv 2022-12-04 15:31:26,575 [INFO ] Writing vcf to stdout 2022-12-04 15:31:26,575 [INFO ] Running pipeline 2022-12-04 15:31:27,337 [INFO ] Calculating insert size. Removed 735 outliers with insert size >= 1033.0 2022-12-04 15:31:27,345 [INFO ] Inferred read length 148.0, insert median 302, insert stdev 128 2022-12-04 15:31:27,362 [INFO ] Max clustering dist 942 2022-12-04 15:31:27,362 [INFO ] Minimum support 3 2022-12-04 15:31:27,372 [INFO ] Building graph with clustering 942 bp Traceback (most recent call last): File "dysgu/graph.pyx", line 754, in dysgu.graph.alignments_from_sa_tag TypeError: '<' not supported between instances of 'bool' and 'str' Exception ignored in: 'dysgu.graph.process_alignment' Traceback (most recent call last): File "dysgu/graph.pyx", line 754, in dysgu.graph.alignments_from_sa_tag TypeError: '<' not supported between instances of 'bool' and 'str' Traceback (most recent call last): File "dysgu/graph.pyx", line 754, in dysgu.graph.alignments_from_sa_tag TypeError: '<' not supported between instances of 'bool' and 'str' Exception ignored in: 'dysgu.graph.process_alignment' Traceback (most recent call last): File "dysgu/graph.pyx", line 754, in dysgu.graph.alignments_from_sa_tag TypeError: '<' not supported between instances of 'bool' and 'str' Traceback (most recent call last): File "dysgu/graph.pyx", line 754, in dysgu.graph.alignments_from_sa_tag TypeError: '<' not supported between instances of 'bool' and 'str' Exception ignored in: 'dysgu.graph.process_alignment' Traceback (most recent call last): File "dysgu/graph.pyx", line 754, in dysgu.graph.alignments_from_sa_tag TypeError: '<' not supported between instances of 'bool' and 'str'