lemene / PECAT

PECAT, a phased error correct and assembly tool
BSD 2-Clause "Simplified" License
38 stars 1 forks source link

killed fsa_ol_assemble #28

Open melop opened 4 months ago

melop commented 4 months ago

Hello, the pipeline proceeded to the following step then got killed. Is it again a RAM problem? I tried to run it twice and for the second time the server had ~600Gb of free RAM.

2024-05-28 20:45:37 [INFO] Load 8563002 reads from file: /fast3/group_crf/home/cuirf/Camellia/Camellia_achrysantha/pecat/Camellia_achrysantha/1-correct/corrected_reads.fasta 2024-05-28 20:45:53 [INFO] Load overlap file 2024-05-28 20:56:18 [INFO] Overlap size: 2843837037/3466723677 2024-05-28 20:57:22 [INFO] Group overlaps and remove duplicated /fast3/group_crf/home/cuirf/Camellia/Camellia_achrysantha/pecat/Camellia_achrysantha/scripts/asm1_assemble.sh: line 16: 2789903 Killed /public/software/conda_envs/pecat0.0.3/share/pecat-0.0.3-0/bin/fsa_ol_assemble /fast3/group_crf/home/cuirf/Camellia/Camellia_achrysantha/pecat/Camellia_achrysantha/2-align/overlaps.txt --thread_size=24 --output_directory=/fast3/group_crf/home/cuirf/Camellia/Camellia_achrysantha/pecat/Camellia_achrysantha/3-assemble --read_file=/fast3/group_crf/home/cuirf/Camellia/Camellia_achrysantha/pecat/Camellia_achrysantha/1-correct/corrected_reads.fasta --max_trivial_length 10000 Plgd script end: 2024-05-28 21:40:01 2024-05-28 21:40:01 [Error] Failed to run assembling overlaps, asm1

lemene commented 4 months ago

Hi, @melop It may be caused by lack of memory. But according to my experience, 600G should be enough. You can try to use the parameter asm2_assemble_options = --filter0 l=10000:al=5000 and asm1_assemble_options = --filter0 l=10000:al=5000 to filter out trivial overlaps to reduce memory usage. l is the minimum length of reads and al is the minimum length of alignments.

melop commented 3 months ago

I ran the previous step with larger RAM and it continued. But now it hits another error: Could also be memory issue?

2024-06-13 22:23:49 [INFO] Preload reads and contigs 2024-06-13 22:26:55 [INFO] Load 11220303 reads from file: /fast3/group_crf/home/cuirf/Camellia/Camellia_achrysantha/pecat/Camellia_achrysantha/0-prepare/prepared_reads.fasta 2024-06-13 22:26:57 [INFO] Load 1439 reads from file: /fast3/group_crf/home/cuirf/Camellia/Camellia_achrysantha/pecat/Camellia_achrysantha/3-assemble/primary.fasta 2024-06-13 22:26:57 [INFO] Load VCF 2024-06-13 22:27:18 [INFO] Load SNPs: count = 21261469 2024-06-13 22:27:18 [INFO] Set using vcf 2024-06-13 22:27:18 [INFO] Load overlaps fsa_rd_haplotype: overlap_store.cpp:365: static int fsa::OverlapStore::FromSamLine(const string&, fsa::Overlap&, fsa::StringPool::NameId&): Assertion `n2l != loadinginfos.end()' failed. /fast3/group_crf/home/cuirf/Camellia/Camellia_achrysantha/pecat/Camellia_achrysantha/scripts/phs_clair3_phase.sh: line 16: 1810386 Aborted (core dumped) /public/software/conda_envs/pecat0.0.3/share/pecat-0.0.3-0/bin/fsa_rd_haplotype --coverage lc=30 --phase_options icr=0.1:icc=3:sc=10 --filter i=70 --vcf_fname /fast3/group_crf/home/cuirf/Camellia/Camellia_achrysantha/pecat/Camellia_achrysantha/4-phase/clair3/merge_output.vcf.gz --thread_size=24 --ctg_fname=/fast3/group_crf/home/cuirf/Camellia/Camellia_achrysantha/pecat/Camellia_achrysantha/3-assemble/primary.fasta --rd_fname=/fast3/group_crf/home/cuirf/Camellia/Camellia_achrysantha/pecat/Camellia_achrysantha/0-prepare/prepared_reads.fasta --ol_fname=/fast3/group_crf/home/cuirf/Camellia/Camellia_achrysantha/pecat/Camellia_achrysantha/4-phase/clair3/rd_2_ctg.sam --output_directory=/fast3/group_crf/home/cuirf/Camellia/Camellia_achrysantha/pecat/Camellia_achrysantha/4-phase/clair3 Plgd script end: 2024-06-13 22:27:18 2024-06-13 22:27:19 [Error] Failed to run phasing reads with contigs, phs_clair3

lemene commented 3 months ago

hi @melop The error in log is the contig names in 4-phase/clair3/rd_2_ctg.sam are inconsistent with the cotings in 3-assemble/primary.fasta. Was the program interrupted? One possible reason is that primary.fasta has been regenerated.

melop commented 3 months ago

Yes the program failed in generating the bam file, so I ran the sam to bam conversion on my own then continued the pipeline. The default command line generated by pecat doesn't appear to work with my samtools version. What SAMTOOLS version?

melop commented 3 months ago

primary.fasta had a date that was earlier than rd_2_ctg.sam, so that probably doesn't explain it?

lemene commented 3 months ago

Samtools version should be 1.7+ . The assertion means that PECAT found some contigs in primary.fasta are not in the header of samfile.

melop commented 3 months ago

The samtool was 1.15.1, which should be 1.7+. Strangely, the sam file produced by the pipeline had no header, this was one of the reason why it failed to be converted to bam format.

lemene commented 3 months ago

@melop It should be for this reason. PECAT requires scanning the header to obtain length information to assist in loading alignments.