Open drk0311 opened 11 months ago
hi, @drk0311. It may be out of memory. Can you show the size of files in the directory 1-correct/0
? Blocks can be divided into smaller ones to reduce memory usage, for example corr_block_size=1000000000
. You need to delete the directory 1-correct/0
first before running pecat
again.
Thank you!
The PAF files in 1-corect/0
are more than 300 GB, maybe too big, right.
I'll try smaller corr_block_size
option and see what happens.
Hello - I had a similar problem also using a sample with a large genome (6Gb) and using a smaller corr_block_size
helped me. However, after solving this first problem I made it to crr0_correct
after 187 hours of running and 16Tb of intermediate files, but I ran out of memory again. Is there anyway to restart from where I'm at? In the scripts folder it looks like I've completed 86/212 of the crr0_correct_*.sh
scripts. If the program doesn't have a built in way to restart, could I just kick off each of the remaining crr0correct*.sh scripts manually? I'm only interested in these precorrection steps from PECAT and it looks like I'm almost done. I really hope there is a way I can restart.
hi, @stubbsrl
PECAT can continue to run at the breakpoint using the same command pecat.pl correct cfg
. Using the cleanup=1
can reduce intermediate files.
Thank you @lemene!
I used both cleanup=1
and adding the parameter -f 0.005
to corr_rd2rd_options
and align_rd2rd_options
and at the peak I had 16Tb of files. It's now down to 12Tb as the program is using those large files and deleting them.
The analysis made it through 211 of the 212 blocks, but ran out of memory on one block (block #210). I copied over rd2rd.txt.sub.210.paf, prepared_reads.fasta, readname.core.210, and the crr0_correct_210.sh script to a computer with 500 gb memory. I changed the paths in the script and kicked off the analysis crr0_correct_210.sh script. It runs for 1 hour and 45 mins and still runs out of memory. I looked at the help flag for fsa_rd_correct
and I tried in separate attempts adding the --use_cache
flag, reducing --thread_size
, and increasing --min_coverage
. Any suggestions on how to get this final file to be corrected? Any other changes I can make to the crr0_correct_210.sh script? Could I split the paf somehow? My plan is once this paf turns into a fasta to move all of the #210 files back with the other corrected fastas and finish the analysis. Thanks in advanced for any suggestions!
Hi @stubbsrl
Reducing --thread_size
should be helpful. Moving files back is feasible. But you need to create file crr0_correct_210.sh.done
and write 0 in the file.
Hi @lemene thanks for the feedback. I've tried various thread sizes all the way down to --thread_size 20
and I am still getting segmentation faults/core dumps. Any other suggestions?
Hi @stubbsrl You can use /usr/bin/time - v
to check if the segmentation fault is caused by lack of memory.
There is another method to reduce memory usage by dividing the file --read_name_fname=xxx/1-correct/0/readname.core.210
into multiple files. Then you can manually correct the reads separately and merge the results. PECAT only corrects the reads whose names is in readname.core.xxx
. In this way, PECAT doesn't load irrelevant overlaps.
Hi! I encountered similar error here. I'm trying to test this out with a bacteria with just 5.5mb size but got aborted core errors. I'd be very much happy to receive your inputs about this especially that I'm interested to try this program on a plant with ~400mb size.
Here's my options for correction `project= smarcescens reads= smarcescens_simplex.filtered.fastq genome_size= 5500000 threads=4 cleanup=1 grid=local
prep_min_length=3000 prep_output_coverage=80
corr_iterate_number=1 corr_block_size=4000000000 corr_filter_options=--filter0=l=5000:al=2500:alr=0.5:aal=5000:oh=1000:ohr=0.1 corr_correct_options=--score=weight:lc=10 --aligner edlib --filter1 oh=1000:ohr=0.01 corr_rd2rd_options=-x ava-ont corr_output_coverage=80`
Here's the terminal log.
2024-05-05 01:32:40 [INFO] Start Correcting 2024-05-05 01:32:40 [INFO] thread size 4, totalsize 12211 2024-05-05 01:32:40 [INFO] done 0, all 12211 free(): double free detected in tcache 2 /data/user/test_assemblers/smarcescens/scripts/crr0_correct_0.sh: line 16: 1473021 Aborted (core dumped) /data/user/miniconda3/envs/pecat/share/pecat-0.0.1-1/bin/fsa_rd_correct /data/user/test_assemblers/smarcescens/1-correct/0/rd2rd.txt.sub.0.paf /data/user/test_assemblers/smarcescens/0-prepare/prepared_reads.fasta /data/user/test_assemblers/smarcescens/1-correct/0/corrected_reads_0.fasta.0 --output_directory=/data/user/test_assemblers/smarcescens/1-correct/0 --thread_size=4 --read_name_fname=/data/user/test_assemblers/smarcescens/1-correct/0/readname.core.0 --infos_fname /data/user/test_assemblers/smarcescens/1-correct/0/corrected_reads_0.fasta.0.infos --score=weight:lc=10 --aligner edlib --filter1 oh=1000:ohr=0.01 Plgd script end: 2024-05-05 01:32:41 2024-05-05 01:32:43 [Error] Failed to run correcting reads 0, crr0
Here's the /usr/bin/time -v output
Command exited with non-zero status 1 Command being timed: "pecat.pl correct cfgfile_1" User time (seconds): 506.21 System time (seconds): 9.50 Percent of CPU this job got: 322% Elapsed (wall clock) time (h:mm:ss or m:ss): 2:40.15 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 1534148 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 4461422 Voluntary context switches: 67738 Involuntary context switches: 5650 Swaps: 0 File system inputs: 3696 File system outputs: 1768432 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 1
Hi @jp-jong
From the message free(): double free detected in tcache 2
, it seems that this is not a lack of memory , but a bug in PECAT. But I cannot locate it. I suggest you try the version v0.0.3.
I checked and I am using v0.0.3. Here is the output using /usr/bin/time - v
. Also note I'm just running the shell script for the one paf where I keep getting a segmentation fault and I made separate readname files as you suggested (you'll see I'm using readname.core1.210
). The machine I'm using has 500gb mem.
/usr/bin/time -v bash crr0_correct_210_1.sh
Plgd script start: 2024-05-09 21:55:06
2024-05-09 21:55:06 [INFO] Start
2024-05-09 21:55:06 [INFO] Arguments:
ol_fname = /mydir/pecat/rd2rd.txt.sub.210.paf
rr_fname = /mydir/pecat/prepared_reads.fasta
cr_fname = /mydir/pecat/corrected_reads_0.fasta.210
filter0 = l=2000:i=0.00:al=2000:alr=0.50:oh=10000000:ohr=1.00:aal=10000000:aalr=1.00:ilid=0
filter1 = l=0:i=0.00:al=0:alr=1.00:oh=1000:ohr=0.01:aal=10000000:aalr=1.00:ilid=0
graph_fname =
infos_fname = /mydir/pecat/corrected_reads_0.fasta.210.inf
os
output_directory = /mydir/pecat
read_name =
read_name_fname = /mydir/pecat/readname.core1.210
aligner = edlib:bs=1000:mc=6
score = weight:lc=10
candidate = c=80:n=600:f=30:p=0.95:ohwt=0.1
min_identity = 90
min_local_identity = 80
thread_size = 70
min_coverage = 4
variants =
use_cache = 0
debug = 0
debug_flag = 0
skip_branch_check = 0
2024-05-09 21:55:06 [INFO] Memory: 9124
2024-05-09 22:10:49 [INFO] Load 1896657424 overlaps from file /mydir/pecat/rd2rd.txt.sub.210.paf
2024-05-09 22:16:16 [INFO] Load 18244710 reads from file: /mydir/pecat/prepared_reads.fasta
2024-05-09 22:16:17 [INFO] Memory: 194866120
2024-05-09 22:57:00 [INFO] BuildIndex: memory=224501524
2024-05-09 22:57:00 [INFO] Memory: 224501524
2024-05-09 22:57:00 [INFO] Start Correcting
2024-05-09 22:57:00 [INFO] thread size 70, totalsize 33965
2024-05-09 22:57:00crr0_correct_210_1.sh: line 16: 1544632 Segmentation fault (core dumped) /mydir/miniconda3/envs/pecat-env/share/pecat-0.0.3-0/bin/fsa_rd_correct /mydir/pecat/rd2rd.txt.sub.210.paf /mydir/pecat/prepared_reads.fasta /mydir/pecat/correct
ed_reads_0.fasta.210 --output_directory=/mydir/pecat --threa
d_size=70 --read_name_fname=/mydir/pecat/readname.core1.210
--infos_fname /mydir/pecat/corrected_reads_0.fasta.210.infos
--score=weight:lc=10 --aligner edlib:bs=1000:mc=6 --min_coverage 4 --filter1 oh=1000:ohr=0.01 --can
didate n=600:f=30 --min_identity 90 --min_local_identity 80
Plgd script end: 2024-05-09 22:57:15
Command being timed: "bash crr0_correct_210_1.sh"
User time (seconds): 10257.79
System time (seconds): 1073.89
Percent of CPU this job got: 303%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:02:08
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 224502344
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 139
Minor (reclaiming a frame) page faults: 662291363
Voluntary context switches: 48761
Involuntary context switches: 1138676
Swaps: 0
File system inputs: 1874387864
File system outputs: 8
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
@stubbsrl I still think that this error seems to be caused by the lack of memory. Is this machine running other programs simultaneously? You can try using fewer threads, such as -- thread_size=10
, and reducing the number of reads in the file readname.core1.210
, such as only containing 100 reads. If it still fails, please add the parameter '--debug', and keep only one read in readname.core1.210
, and run bash crr0_correct_210_1.sh > debug_log
. debug_log
will contain more information to help locate the bug.
@lemene the v0.0.3 seems to work without errors. Thanks!!
I set the thread_size=10
and also set my script to occupy the entire node. I broke readname.core.210
into 6 different files and ran each one separately. This worked! Thank you for your help!
After it finished I concatenated the 6 corrected_reads_0.fasta.210
into one file and did the same for the .infos file. These concatenated files were moved back to 1-correct/0
with the other corrected reads. I also made a crr0_correct_210.sh.done
file in the scripts
dir and put a 0
in it.
I run pecat.pl correct cfgfile
and this is the output and error. I must have missed something or deleted something because I think it's gone back a step to splitting readnames.
(pecat-env) rstubbs/projects/pecat$ pecat.pl correct cfgfile
2024-05-29 21:53:12 [Info] Start running job prp_job, preparing reads
2024-05-29 21:53:12 [Info] Skip preparing reads for outputs are newer.
2024-05-29 21:53:12 [Info] Start running job crr_job, correcting reads, crr
2024-05-29 21:53:12 [Info] Start correcting reads, crr.
2024-05-29 21:53:12 [Info] Job::Serial::submit_core crr_job
2024-05-29 21:53:12 [Info] Start running job crr0_correct, correcting rawreads, crr0
2024-05-29 21:53:12 [Info] Start correcting rawreads, crr0.
2024-05-29 21:53:12 [Info] Job::Serial::submit_core crr0_correct
2024-05-29 21:53:12 [Info] Start running job crr0_rd2rd_job, aligning reads to reads, crr0
2024-05-29 21:53:12 [Info] Skip aligning reads to reads, crr0 for outputs are newer.
2024-05-29 21:53:12 [Info] Start running job crr0_split, spliting read names, crr0
2024-05-29 21:53:12 [Info] Start spliting read names, crr0.
Argument "" isn't numeric in addition (+) at /rstubbs/my_conda/envs/pecat-env/share/pecat-0.0.3-0/bin/Plgd/Job/Script.pm line 47, <F> line 3.
2024-05-29 21:53:12 [Info] Run script /rstubbs/projects/genome/pecat/genome/scripts/crr0_split.sh
2024-05-29 21:53:12 [Info] Sumbit command: /rstubbs/projects/genome/pecat/genome/scripts/crr0_split.sh 2>&1 | tee /rstubbs/projects/genome/pecat/genome/scripts/crr0_split.sh.log
Plgd script start: 2024-05-29 21:53:12
2024-05-29 21:53:12 [INFO] Start
2024-05-29 21:53:12 [INFO] Arguments:
reads = /rstubbs/projects/genome/pecat/genome/0-prepare/prepared_reads.fasta
sub_reads = /rstubbs/projects/genome/pecat/genome/1-correct/0/readname.core.{}
block_size = 1000000000
base_size = 234000000000
thread_size = 70
method = plain
overlaps = /rstubbs/projects/genome/pecat/genome/1-correct/0/rd2rd.txt
sub_overlaps = /rstubbs/projects/genome/pecat/genome/1-correct/0/rd2rd.txt.sub.{}.paf
2024-05-29 21:53:12 [INFO] load read names
2024-05-29 21:55:50 [INFO] minimum length = 3000
2024-05-29 21:55:52 [INFO] read size = 18244710
2024-05-29 21:55:52 [INFO] Save read names
2024-05-29 21:55:58 [INFO] Save overlaps
2024-05-29 21:55:58 [ERROR] Failed to open file: /rstubbs/projects/genome/pecat/genome/1-correct/0/cc0.fasta.paf
/rstubbs/projects/genome/pecat/genome/scripts/crr0_split.sh: line 28: 414344 Aborted /rstubbs/my_conda/envs/pecat-env/share/pecat-0.0.3-0/bin/fsa_rd_tools split_name /rstubbs/projects/genome/pecat/genome/0-prepare/prepared_reads.fasta /rstubbs/projects/genome/pecat/genome/1-correct/0/readname.core.{} --block_size 1000000000 --base_size 234000000000 --overlaps /rstubbs/projects/genome/pecat/genome/1-correct/0/rd2rd.txt --sub_overlaps /rstubbs/projects/genome/pecat/genome/1-correct/0/rd2rd.txt.sub.{}.paf --thread_size 70
Plgd script end: 2024-05-29 21:55:58
2024-05-29 21:56:02 [Error] Failed to run spliting read names, crr0
@stubbsrl
Have the files 1-correct/0/corrected_reads_X.fasta
been deleted? Can you show me what other files are in the directory 1-correct/0/
?
No I still have the corrected reads fastas.
Inside the 1-correct/0/
dir I have: corrected_reads_0.fasta.X
, corrected_reads_0.fasta.X.infos
, and rd2rd.txt
. Yesterday when I tried to restart pecat it made readname.core.X
and rd2rd.txt.sub.X.paf
in that dir.
crr0_split.sh
is the previous step and should not be executed again, otherwise the corrected reads (corrected_reads_0.fasta.X
) will be deleted. The time of the file may have been incorrect, and PECAT did not accurately recognize the current status. If you don't want to discard the corrected reads, it's best to manually execute the scripts.
oh wow I'm very glad corrected reads didn't get deleted. So, what is the next script that I should execute manually? Everything in the scripts
dir has a done file with it so nothing jumps out to me.
Each file crr0_correct_X.sh
should be executed and the corresponding files corrected_reads_0.fasta.X
, corrected_reads_0.fasta.X.infos
will be generated. corrected_reads_0.fasta.X
are corrected reads. They are concatenated to a single file as the output of the correction step. The crr0_correct_X.sh
requires file rd2rd.txt.sub.X.paf
. rd2rd.txt.sub.X.paf
are generated from ccX.fasta.paf
by crr0_split.sh
. Is rd2rd.txt.sub.X.paf
deleted? It's a bit cumbersome to regenerate them.
Following are scripts in execution order and the associated input and output files.
crr0_rd2rd_split.sh
: ../../0-prepare/prepared_reads.fasta
-> ccX.fasta
crr0_rd2rd_index.sh
: ../../0-prepare/prepared_reads.fasta
-> reads.index
crr0_rd2rd_align_X.sh
: ccX.fasta
, reads.index
-> ccX.fasta.paf
crr0_split.sh
: ccX.fasta.paf
-> rd2rd.txt.sub.X.paf
, readname.core.X
crr0_correct_X.sh
: rd2rd.txt.sub.X.paf
, readname.core.X
-> corrected_reads_0.fasta.X
, corrected_reads_0.fasta.X.infos
crr0_cat.sh
: corrected_reads_0.fasta.X
, -> ../corrected_reads.fasta
To regenerate rd2rd.txt.sub.X.paf
, all ccX.fasta.paf
are required,which means you need to redo the alignment between the reads.
`
Hi, I am the guy who opened this issue. The cause of core dump or segmentation fault issues often come from organelle genome (chloroplast and mitochondrion). So if you are working with a plant genome of quite a large size (>3 Gbp), you'd better remove the reads derived from organelle genomes from your fastq files. Maybe your fastq files contain more then 10,000x of the chloroplast genome!!
So, if the reference organelle genomes are available, map your reads to it, extract unmapped reads or those mapped only <50% of its read length, and then run PECAT.
I tried wit 700 Mbp genome and got a fantastic results! Thank you!!
Now I'm now working on a plant genome of ~4 Gbp and got segmentation fault problem. So if it is using too much memory, is there a way to save it?
The error message is like below;
the cfgfile is as below.