wtsi-hpag / Scaff10X

Pipeline for scaffolding and breaking a genome assembly using 10x genomics linked-reads
MIT License
20 stars 3 forks source link

segmentation fault #16

Open BFeldmeyer opened 4 years ago

BFeldmeyer commented 4 years ago

Hello, I am running Scaff10x V.4.2 with the following command: scaff10x -nodes 40 -longread 1 -plot 10x_coverage.png PacBioScaffolds.fasta 19-Bp_S1_L001_R1_001.fastq.gz 19-Bp_S1_L001_R2_001.fastq.gz Result_scaff10x.fasta

Scaff10x is running and also produces a couple of output files, of which align2.dat and try.out are empty: 0 Apr 30 17:40 align2.dat 28G Apr 29 22:56 align.dat 1.3G Apr 29 11:40 cleaN.fasta 2.9M Apr 29 22:56 core.61997 2.5G Apr 29 11:40 tarseq.fastq 19 Apr 29 11:58 tarseq.fastq.amb 1.6M Apr 29 11:58 tarseq.fastq.ann 1.3G Apr 29 11:58 tarseq.fastq.bwt 312M Apr 29 11:58 tarseq.fastq.pac 624M Apr 29 12:06 tarseq.fastq.sa 1.9M Apr 29 11:40 tarseq.tag 0 Apr 29 22:56 try.out

and I obtain the following error message: ... [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 545) [M::mem_pestat] mean and std.dev: (192.88, 98.24) [M::mem_pestat] low and high boundaries for proper pairs: (1, 685) [M::mem_pestat] skip orientation FF [M::mem_pestat] skip orientation RF [M::mem_pestat] skip orientation RR [M::mem_process_seqs] Processed 434416 reads in 662.155 CPU sec, 18.441 real sec [main] Version: 0.7.17-r1198-dirty [main] CMD: /cluster/software/scaff10x/Scaff10X-4.2/src/scaff-bin/bwa mem -t 40 tarseq.fastq R1_001.fastq.gz 19-Bp_S1_L001_R2_001.fastq.gz [main] Real time: 38988.280 sec; CPU: 1509159.415 sec sh: line 1: 61997 Segmentation fault (core dumped) /cluster/software/scaff10x/Scaff10X-4.2/src/scaff-bin/scaff_bwa -edge 50000 tarseq.tag align.dat align2.dat > try.out Error running command: /cluster/software/scaff10x/Scaff10X-4.2/src/scaff-bin/scaff_bwa -edge 50000 tarseq.tag align.dat align2.dat > try.out

zning-sanger commented 4 years ago

Thanks for your email. It looks that the code staff_bwa failed. Could you send me a few lines of the texts from align.dat? 10 or two line should be fine. I just want to check if the data format is in line with the pipeline.

Best regards,

Zemin

zning-sanger commented 4 years ago

Sorry 10 or 20 lines from align.dat.

zning-sanger commented 4 years ago

It is also possible that BWA didn't get finished and the number of column in the last line is not equal the one in other lines. If this is the case, you can repeat another run.

BFeldmeyer commented 4 years ago

Hello,

here the "head" and "tail" of align.dat

[s_bfeldmeyer@claudius tmp_rununik_14814]$ head align.dat GWNJ-0957:377:GW1902281905:7:1101:17797:1344 83 tarseq_2701 533 23 GWNJ-0957:377:GW1902281905:7:1101:7111:1362 99 tarseq_754 52659 0 GWNJ-0957:377:GW1902281905:7:1101:9546:1362 83 tarseq_24652 18239 0 GWNJ-0957:377:GW1902281905:7:1101:12550:1362 99 tarseq_25146 5604 60 GWNJ-0957:377:GW1902281905:7:1101:14539:1362 83 tarseq_9461 14547 60 GWNJ-0957:377:GW1902281905:7:1101:24383:1362 83 tarseq_7742 42968 14 GWNJ-0957:377:GW1902281905:7:1101:25723:1362 99 tarseq_2070 54217 4 GWNJ-0957:377:GW1902281905:7:1101:8339:1379 99 tarseq_5021 2992 11 GWNJ-0957:377:GW1902281905:7:1101:10064:1379 83 tarseq_1911 54985 17 GWNJ-0957:377:GW1902281905:7:1101:12581:1379 83 tarseq_11715 3540 0

GWNJ-0957:377:GW1902281905:8:2224:27570:73053 97 tarseq_12548 2460 12 GWNJ-0957:377:GW1902281905:8:2224:28036:73053 83 tarseq_5239 53038 60 GWNJ-0957:377:GW1902281905:8:2224:1336:73071 99 tarseq_12935 14989 60 GWNJ-0957:377:GW1902281905:8:2224:2250:73071 83 tarseq_11559 34264 0 GWNJ-0957:377:GW1902281905:8:2224:2331:73071 99 tarseq_33815 759 40 GWNJ-0957:377:GW1902281905:8:2224:2615:73071 99 tarseq_20916 2188 60 GWNJ-0957:377:GW1902281905:8:2224:2838:73071 99 tarseq_18027 4480 15 GWNJ-0957:377:GW1902281905:8:2224:3143:73071 99 tarseq_20467 16653 60 GWNJ-0957:377:GW1902281905:8:2224:3386:73071 99 tarseq_38403 8267 17 GWNJ-0957:377:GW1902281905:8:2224:3711:73071 99 tarseq_37695 4797 0

Am 04.05.2020 um 16:20 schrieb Zemin Ning:

Sorry 10 or 20 lines from align.dat.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wtsi-hpag/Scaff10X/issues/16#issuecomment-623493078, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOPLJN7AG2APS7RQBHQTI3DRP3FKFANCNFSM4MYV4XAQ.

zning-sanger commented 4 years ago

Thanks. The file align.dat seems to be fine. How much RAM do you have in your computer? The code of staff_bwa does not use much RAM at this stage, maybe 5-6 GB.

zhangzhiyangcs commented 3 years ago

I have the same question when I ran to scaff_bwa. My RAM is 3TB and my genome is 500Mb. I think it can't be the RAM problem.

jensbast commented 3 years ago

Hi all, is there a solution in the end? I am encountering the same problem. I also have 400 GB memory, so no memory issue.

the align.dat lines:

head align.dat A00685:102:HWLNMDRXX:1:2101:1217:1000 99 tarseq_73 587650 56 A00685:102:HWLNMDRXX:1:2101:1235:1000 99 tarseq_37 491377 60 A00685:102:HWLNMDRXX:1:2101:1307:1000 83 tarseq_16 526825 60 A00685:102:HWLNMDRXX:1:2101:1976:1000 83 tarseq_44 103907 60 A00685:102:HWLNMDRXX:1:2101:2211:1000 99 tarseq_31 2014651 60 A00685:102:HWLNMDRXX:1:2101:2230:1000 83 tarseq_12 1873831 60 A00685:102:HWLNMDRXX:1:2101:3296:1000 99 tarseq_43 628817 9 A00685:102:HWLNMDRXX:1:2101:3351:1000 83 tarseq_56 766484 60 A00685:102:HWLNMDRXX:1:2101:3477:1000 83 tarseq_76 662403 60 A00685:102:HWLNMDRXX:1:2101:3586:1000 99 tarseq_90 226154 60

tail align.dat A00685:102:HWLNMDRXX:2:2278:30752:37059 99 tarseq_16 1050079 60 A00685:102:HWLNMDRXX:2:2278:30843:37059 83 tarseq_14 2815439 34 A00685:102:HWLNMDRXX:2:2278:31186:37059 83 tarseq_14 1183768 60 A00685:102:HWLNMDRXX:2:2278:31222:37059 81 tarseq_7 4039605 0 A00685:102:HWLNMDRXX:2:2278:31656:37059 99 tarseq_113 391255 60 A00685:102:HWLNMDRXX:2:2278:31982:37059 83 tarseq_29 451249 60 A00685:102:HWLNMDRXX:2:2278:32072:37059 83 tarseq_93 58721 40 A00685:102:HWLNMDRXX:2:2278:32145:37059 99 tarseq_7 3602339 60 A00685:102:HWLNMDRXX:2:2278:32452:37059 99 tarseq_41 197280 60 A00685:102:HWLNMDRXX:2:2278:32633:37059 65 tarseq_9 2755947 21

fredjaya commented 3 years ago

I encountered the same segfault issue mentioned here and in #18 when using the raw R1 and R2 linked reads as inputs (in this case, TELL-seq reads with barcodes converted to be 10X compatible) for Scaff10X v4.2.

Running scaff_reads to generate debarcoded *.fastq.gz R1 and R2 files, and using these as inputs for scaff10x v4.2 seemed to fix it, as alluded to here.

Note that there may be a path issue with scaff_reads as previously raised (#5) - quick fix here.