Open enriquepola1996 opened 1 month ago
Hi @enriquepola1996
The most likely cause is insufficient memory. You can verify this conclusion by running /var/bin/time -v asm1_assemble.sh
.
Hi @lemene , the process finish with this:
Plgd script end: 2024-08-01 11:37:05
Command being timed: "./plant_manu/scripts/asm1_assemble.sh"
User time (seconds): 4444.19
System time (seconds): 254.80
Percent of CPU this job got: 299%
Elapsed (wall clock) time (h:mm:ss or m:ss): 26:07.18
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 209713668
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1215
Minor (reclaiming a frame) page faults: 52462642
Voluntary context switches: 1493451
Involuntary context switches: 23421
Swaps: 0
File system inputs: 378741728
File system outputs: 13480
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
(pecat-env) villalobos@alnitak:/data/car_manu/PECAT/demo$ free -h
total used free shared buff/cache available
Mem: 251Gi 48Gi 200Gi 706Mi 1.9Gi 199Gi
Swap: 2.0Gi 2.0Gi 0B
Is my memory? Thanks so much.
I have another question, it is regarding the prep_output_coverage parameter. In my sequencing report it indicates that a coverage of approximately 60X was achieved, so should I adjust the prep_output_coverage parameter based on this? The same for corr_output_coverage.
Thank you very much.
Hi, @enriquepola1996
corr_output_coverage
and prep_output_coverage
is used to reduce the size of datasets. In both error correction and assembly steps, PECAT implements all-vs-all comparisons of the reads. If the dataset coverage is too high, it can significantly impact the program’s running efficiency, with only marginal improvements in assembly quality. When corr_output_coverage
and prep_output_coverage
are greater than the actual coverage of the dataset, they will be ignored.
Hello @lemene , the process finish with more memory but now I have two questions, the alternate.fasta is much smaller than the primary.fasta, is that normal? The script I ran is cfg_cattle_clr but I'm still waiting for cfg_arab_clr to finish. In both I set the genome size to 400Mb. In this first result I obtained a genome of 322Mb and an N50 of 18Mb (3-assemble) and 330Mb and an N50 of 16Mb (5-assemble) for primary assembly.
[epola@mazorka test_car_catle]$ ls
0-prepare 1-correct 2-align 3-assemble 4-phase 5-assemble 6-polish scripts
[epola@mazorka test_car_catle]$ ls -lh 5-assemble/
total 641M
-rw-rw-r-- 1 epola epola 4.1M Aug 2 03:39 alternate.fasta
-rw-rw-r-- 1 epola epola 19K Aug 2 03:39 alternate_tiles
-rw-rw-r-- 1 epola epola 315M Aug 2 03:39 haplotype_1.fasta
-rw-rw-r-- 1 epola epola 1.7M Aug 2 03:39 haplotype_1_tiles
-rw-rw-r-- 1 epola epola 3.9M Aug 2 03:39 haplotype_2.fasta
-rw-rw-r-- 1 epola epola 20K Aug 2 03:39 haplotype_2_tiles
-rw-rw-r-- 1 epola epola 315M Aug 2 03:39 primary.fasta
-rw-rw-r-- 1 epola epola 1.7M Aug 2 03:39 primary_tiles
-rw-rw-r-- 1 epola epola 887K Aug 2 03:39 rest.fasta
-rw-rw-r-- 1 epola epola 5.4K Aug 2 03:39 rest_tiles
I am comparing these results with Canu, where I ran the following:
canu \
useGrid=false \
-p assembly \
-d canu_plant_400 \
genomeSize=400m \
-pacbio-raw test.fasta.gz \
corOutCoverage=200 \
"batOptions=-dg 3 -db 3 -dr 1 -ca 500 -cp 50"
And I got an assembly of 322Mb and an N50 of 12Mb.
The other question is that the polishing process was not successful, what is the probable reason?
[epola@mazorka test_car_catle]$ ls -lh 6-polish/racon/
total 155M
-rw-rw-r-- 1 epola epola 0 Aug 2 07:11 alternate.fasta
-rw-rw-r-- 1 epola epola 0 Aug 2 07:11 haplotype_1.fasta
-rw-rw-r-- 1 epola epola 0 Aug 2 07:11 haplotype_2.fasta
-rw-rw-r-- 1 epola epola 0 Aug 2 07:11 primary.fasta
-rw-rw-r-- 1 epola epola 2.8M Aug 2 07:07 rd_2_alt_names
-rw-rw-r-- 1 epola epola 76M Aug 2 07:11 rd_2_hap1_names
-rw-rw-r-- 1 epola epola 1.9M Aug 2 07:11 rd_2_hap2_names
-rw-rw-r-- 1 epola epola 75M Aug 2 07:07 rd_2_pri_names
The final of the error.log file:
2024-08-02 03:15:03 [Info] Start assembling overlaps, asm2.
Argument "" isn't numeric in addition (+) at /LUSTRE/storage/data/software/PECAT/build/bin/Plgd/Job/Script.pm line 47, <F> line 40.
2024-08-02 03:15:03 [Info] Run script /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/asm2_assemble.sh
2024-08-02 03:15:03 [Info] Sumbit command: /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/asm2_assemble.sh 2>&1 | tee /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/asm2_assemble.sh.log
2024-08-02 03:42:08 [Info] End assembling overlaps, asm2.
2024-08-02 03:42:08 [Info] Start running job pol_job, polishing contigs
2024-08-02 03:42:08 [Info] Start polishing contigs.
2024-08-02 03:42:08 [Info] Job::Serial::submit_core pol_job
2024-08-02 03:42:08 [Info] Start running job pol_job_racon, polishing contigs using racon
2024-08-02 03:42:08 [Info] Start polishing contigs using racon.
2024-08-02 03:42:08 [Info] Job::Parallel::submit_core pol_job_racon
2024-08-02 03:42:08 [Info] Start running job pol_job, polishing job for pri/alt contigs
2024-08-02 03:42:08 [Info] Start polishing job for pri/alt contigs.
2024-08-02 03:42:08 [Info] Job::Serial::submit_core pol_job
2024-08-02 03:42:08 [Info] Start running job pol_map, mapping reads to contigs
2024-08-02 03:42:08 [Info] Start mapping reads to contigs.
Argument "" isn't numeric in addition (+) at /LUSTRE/storage/data/software/PECAT/build/bin/Plgd/Job/Script.pm line 47, <F> line 41.
2024-08-02 03:42:08 [Info] Run script /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_map.sh
2024-08-02 03:42:08 [Info] Start running job pol_job_hap, polishing dual format contigs
2024-08-02 03:42:08 [Info] Sumbit command: /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_map.sh 2>&1 | tee /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_map.sh.log
2024-08-02 03:42:08 [Info] Start polishing dual format contigs.
2024-08-02 03:42:08 [Info] Job::Serial::submit_core pol_job_hap
2024-08-02 03:42:08 [Info] Start running job pol_map_hap, mapping reads to contigs
2024-08-02 03:42:08 [Info] Start mapping reads to contigs.
Argument "" isn't numeric in addition (+) at /LUSTRE/storage/data/software/PECAT/build/bin/Plgd/Job/Script.pm line 47, <F> line 41.
2024-08-02 05:23:13 [Info] Run script /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_map_hap.sh
2024-08-02 05:23:13 [Info] Sumbit command: /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_map_hap.sh 2>&1 | tee /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_map_hap.sh.log
2024-08-02 05:23:13 [Info] End mapping reads to contigs.
2024-08-02 05:23:13 [Info] Start running job pol_filter, filter overlaps in which the reads are different haplotype
2024-08-02 05:23:13 [Info] Start filter overlaps in which the reads are different haplotype.
Argument "" isn't numeric in addition (+) at /LUSTRE/storage/data/software/PECAT/build/bin/Plgd/Job/Script.pm line 47, <F> line 42.
2024-08-02 07:03:39 [Info] Run script /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_filter.sh
2024-08-02 07:03:39 [Info] Sumbit command: /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_filter.sh 2>&1 | tee /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_filter.sh.log
2024-08-02 07:03:39 [Info] End mapping reads to contigs.
2024-08-02 07:03:39 [Info] Start running job pol_filter_hap, filter overlaps in which the reads are different haplotype
2024-08-02 07:03:39 [Info] Start filter overlaps in which the reads are different haplotype.
Argument "" isn't numeric in addition (+) at /LUSTRE/storage/data/software/PECAT/build/bin/Plgd/Job/Script.pm line 47, <F> line 43.
2024-08-02 07:07:34 [Info] Run script /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_filter_hap.sh
2024-08-02 07:07:34 [Info] Sumbit command: /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_filter_hap.sh 2>&1 | tee /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_filter_hap.sh.log
2024-08-02 07:07:34 [Info] End filter overlaps in which the reads are different haplotype.
2024-08-02 07:07:34 [Info] Start running job pol_polish, polishing pri/alt format contigs
2024-08-02 07:07:34 [Info] Start polishing pri/alt format contigs.
2024-08-02 07:07:34 [Info] Job::Parallel::submit_core pol_polish
2024-08-02 07:07:34 [Info] Start running job pol_polish_prj, polishing primary contigs
2024-08-02 07:07:34 [Info] Start polishing primary contigs.
Argument "" isn't numeric in addition (+) at /LUSTRE/storage/data/software/PECAT/build/bin/Plgd/Job/Script.pm line 47, <F> line 44.
2024-08-02 07:07:34 [Info] Start running job pol_polish_alt, polishing alternate contigs
2024-08-02 07:07:34 [Info] Start polishing alternate contigs.
Argument "" isn't numeric in addition (+) at /LUSTRE/storage/data/software/PECAT/build/bin/Plgd/Job/Script.pm line 47, <F> line 44.
2024-08-02 07:11:29 [Info] Run script /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_polish_prj.sh
2024-08-02 07:11:29 [Info] Sumbit command: /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_polish_prj.sh 2>&1 | tee /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_polish_prj.sh.log
2024-08-02 07:11:29 [Info] End filter overlaps in which the reads are different haplotype.
2024-08-02 07:11:29 [Info] Start running job pol_polish_hap, polishing dual format contigs
2024-08-02 07:11:29 [Info] Start polishing dual format contigs.
2024-08-02 07:11:29 [Info] Job::Parallel::submit_core pol_polish_hap
2024-08-02 07:11:29 [Info] Start running job pol_polish_hap_1, polishing dual_1 contigs
2024-08-02 07:11:29 [Info] Start polishing dual_1 contigs.
Argument "" isn't numeric in addition (+) at /LUSTRE/storage/data/software/PECAT/build/bin/Plgd/Job/Script.pm line 47, <F> line 45.
2024-08-02 07:11:29 [Info] Start running job pol_polish_hap_2, polishing dual_2 contigs
2024-08-02 07:11:29 [Info] Start polishing dual_2 contigs.
Argument "" isn't numeric in addition (+) at /LUSTRE/storage/data/software/PECAT/build/bin/Plgd/Job/Script.pm line 47, <F> line 45.
2024-08-02 07:11:34 [Info] Run script /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_polish_hap_2.sh
2024-08-02 07:11:34 [Info] Sumbit command: /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_polish_hap_2.sh 2>&1 | tee /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_polish_hap_2.sh.log
2024-08-02 07:11:34 [Info] End polishing primary contigs.
2024-08-02 07:11:39 [Info] Run script /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_polish_hap_1.sh
2024-08-02 07:11:39 [Info] Sumbit command: /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_polish_hap_1.sh 2>&1 | tee /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_polish_hap_1.sh.log
2024-08-02 07:11:39 [Info] End polishing dual_2 contigs.
2024-08-02 07:11:44 [Info] Run script /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_polish_alt.sh
2024-08-02 07:11:44 [Info] Sumbit command: /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_polish_alt.sh 2>&1 | tee /LUSTRE/usuario/epola/pecat_assembly/test_car_catle/scripts/pol_polish_alt.sh.log
2024-08-02 07:11:44 [Info] End polishing dual_1 contigs.
2024-08-02 07:11:44 [Info] End polishing dual format contigs.
2024-08-02 07:11:44 [Info] End polishing dual format contigs.
2024-08-02 07:11:49 [Info] End polishing alternate contigs.
2024-08-02 07:11:49 [Info] End polishing pri/alt format contigs.
2024-08-02 07:11:49 [Info] End polishing job for pri/alt contigs.
2024-08-02 07:11:49 [Info] End polishing contigs using racon.
2024-08-02 07:11:49 [Info] End polishing contigs.
Thanks so much.
Hi @enriquepola1996
@lemene Thank you very much for your valuable answers, I have two last question, do you recommend running purge haplotigs on the primary assembly (5-assemble)? Do you recommend polishing with raw or corrected reads?
Thanks so much.
@enriquepola1996
5-assemble/primary.fasta
does not require purging of haplotigs. Ideally, different haplotypes should be output in 5-assemble/alternate.fasta
.@lemene Thanks so much for you support, you can close this issue as resolved.
Hello dear developers,
I am using PECAT to assemble a diploid genome (SMRT Cell, PacBio), however, the process is stuck at the assembly stage, the following error appears:
My script:
Can you help me please? Thanks.