Hello,
I'm trying to assembly a 500 Mb genome with PacBio reads. Our reads are relatively short, average 7 k bp, with poor coverage of 10 x. Flye results in an assembly around 170 Mb. I've presumed its to do to our poor coverage? Is there anything in the below is the log output that might prove problematic or any suggesting to improve? Many thanks for your help.
Hello, I'm trying to assembly a 500 Mb genome with PacBio reads. Our reads are relatively short, average 7 k bp, with poor coverage of 10 x. Flye results in an assembly around 170 Mb. I've presumed its to do to our poor coverage? Is there anything in the below is the log output that might prove problematic or any suggesting to improve? Many thanks for your help.
[2023-01-28 14:57:26] root: INFO: >>>STAGE: configure [2023-01-28 14:57:26] root: INFO: Configuring run [2023-01-28 14:57:47] root: INFO: Total read length: 5425409572 [2023-01-28 14:57:47] root: INFO: Input genome size: 509000000 [2023-01-28 14:57:47] root: INFO: Estimated coverage: 10 [2023-01-28 14:57:47] root: INFO: Reads N50/N90: 10492 / 3449 [2023-01-28 14:57:47] root: INFO: Minimum overlap set to 3000 [2023-01-28 14:57:47] root: INFO: >>>STAGE: assembly [2023-01-28 14:57:47] root: INFO: Assembling disjointigs [2023-01-28 14:57:47] root: DEBUG: -----Begin assembly log------ [2023-01-28 14:57:47] root: DEBUG: Running: flye-modules assemble --reads /mainfs/scratch/acj1n18/echium_genome/working_data/all_pacbio_echium.fasta --out-asm /scratch/acj1n18/echium_genome/flye/assembly_alldata/00-assembly/draft_assembly.fasta --config /mainfs/scratch/acj1n18/echium_genome/flye/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /scratch/acj1n18/echium_genome/flye/assembly_alldata/flye.log --threads 20 --genome-size 509000000 --min-ovlp 3000 [2023-01-28 14:57:47] DEBUG: Build date: Jan 26 2023 11:20:18 [2023-01-28 14:57:47] DEBUG: Total RAM: 1511 Gb [2023-01-28 14:57:47] DEBUG: Available RAM: 1410 Gb [2023-01-28 14:57:47] DEBUG: Total CPUs: 64 [2023-01-28 14:57:47] DEBUG: Loading /mainfs/scratch/acj1n18/echium_genome/flye/Flye/flye/config/bin_cfg/asm_raw_reads.cfg [2023-01-28 14:57:47] DEBUG: Loading /mainfs/scratch/acj1n18/echium_genome/flye/Flye/flye/config/bin_cfg/asm_defaults.cfg [2023-01-28 14:57:47] DEBUG: big_genome_threshold=29000000 [2023-01-28 14:57:47] DEBUG: meta_read_filter_kmer_freq=100 [2023-01-28 14:57:47] DEBUG: chain_large_gap_penalty=2 [2023-01-28 14:57:47] DEBUG: chain_small_gap_penalty=0.5 [2023-01-28 14:57:47] DEBUG: chain_gap_jump_threshold=100 [2023-01-28 14:57:47] DEBUG: max_coverage_drop_rate=5 [2023-01-28 14:57:47] DEBUG: max_extensions_drop_rate=5 [2023-01-28 14:57:47] DEBUG: chimera_window=100 [2023-01-28 14:57:47] DEBUG: chimera_overhang=1000 [2023-01-28 14:57:47] DEBUG: min_reads_in_disjointig=4 [2023-01-28 14:57:47] DEBUG: max_inner_reads=10 [2023-01-28 14:57:47] DEBUG: max_inner_fraction=0.25 [2023-01-28 14:57:47] DEBUG: max_separation=500 [2023-01-28 14:57:47] DEBUG: unique_edge_length=50000 [2023-01-28 14:57:47] DEBUG: min_repeat_res_support=0.51 [2023-01-28 14:57:47] DEBUG: out_paths_ratio=5 [2023-01-28 14:57:47] DEBUG: graph_cov_drop_rate=5 [2023-01-28 14:57:47] DEBUG: coverage_estimate_window=100 [2023-01-28 14:57:47] DEBUG: max_bubble_length=50000 [2023-01-28 14:57:47] DEBUG: loop_coverage_rate=1.5 [2023-01-28 14:57:47] DEBUG: repeat_edge_cov_mult=1.75 [2023-01-28 14:57:47] DEBUG: weak_detach_rate=5 [2023-01-28 14:57:47] DEBUG: tip_coverage_rate=2 [2023-01-28 14:57:47] DEBUG: tip_length_rate=2 [2023-01-28 14:57:47] DEBUG: output_gfa_before_rr=0 [2023-01-28 14:57:47] DEBUG: remove_alt_edges=0 [2023-01-28 14:57:47] DEBUG: low_cutoff_warning=1 [2023-01-28 14:57:47] DEBUG: kmer_size=17 [2023-01-28 14:57:47] DEBUG: use_minimizers=0 [2023-01-28 14:57:47] DEBUG: reads_base_alignment=0 [2023-01-28 14:57:47] DEBUG: meta_read_top_kmer_rate=0.40 [2023-01-28 14:57:47] DEBUG: maximum_jump=1500 [2023-01-28 14:57:47] DEBUG: maximum_overhang=1500 [2023-01-28 14:57:47] DEBUG: repeat_kmer_rate=100 [2023-01-28 14:57:47] DEBUG: assemble_ovlp_divergence=0.10 [2023-01-28 14:57:47] DEBUG: assemble_divergence_relative=1 [2023-01-28 14:57:47] DEBUG: repeat_graph_ovlp_divergence=0.08 [2023-01-28 14:57:47] DEBUG: read_align_ovlp_divergence=0.25 [2023-01-28 14:57:47] DEBUG: hpc_scoring_on=0 [2023-01-28 14:57:47] DEBUG: add_unassembled_reads=0 [2023-01-28 14:57:47] DEBUG: extend_contigs_with_repeats=0 [2023-01-28 14:57:47] DEBUG: min_read_cov_cutoff=3 [2023-01-28 14:57:47] DEBUG: short_tip_length=20000 [2023-01-28 14:57:47] DEBUG: long_tip_length=100000 [2023-01-28 14:57:47] DEBUG: Running with k-mer size: 17 [2023-01-28 14:57:47] DEBUG: Running with minimum overlap 3000 [2023-01-28 14:57:47] DEBUG: Metagenome mode: N [2023-01-28 14:57:47] DEBUG: Short mode: N [2023-01-28 14:57:47] INFO: Reading sequences [2023-01-28 14:58:40] DEBUG: Building positional index [2023-01-28 14:58:40] DEBUG: Total sequence: 4996915438 bp [2023-01-28 14:58:43] INFO: Counting k-mers: [2023-01-28 15:01:23] DEBUG: Updating k-mer histogram [2023-01-28 15:03:52] DEBUG: Hash size: 19925530 [2023-01-28 15:03:52] DEBUG: Total k-mers 1723494274 [2023-01-28 15:03:54] INFO: Filling index table (1/2) [2023-01-28 15:06:31] DEBUG: Mean k-mer frequency: 8.74894 [2023-01-28 15:06:31] DEBUG: Repetitive k-mer frequency: 874 [2023-01-28 15:06:31] DEBUG: Filtered 466131580 repetitive k-mers (0.2558) [2023-01-28 15:06:43] INFO: Filling index table (2/2) [2023-01-28 15:08:56] DEBUG: Sorting k-mer index [2023-01-28 15:09:23] DEBUG: Selected k-mers: 339346939 [2023-01-28 15:09:23] DEBUG: Index size: 1452625486 [2023-01-28 15:09:23] DEBUG: Mean k-mer index frequency: 4.28065 [2023-01-28 15:09:23] DEBUG: Peak RAM usage: 31 Gb [2023-01-28 15:09:23] DEBUG: Estimating k-mer identity bias [2023-01-28 15:09:29] DEBUG: Initial divergence estimate : 0.192152 [2023-01-28 15:09:29] DEBUG: Relative threshold: Y [2023-01-28 15:09:29] DEBUG: Max divergence threshold set to 0.292152 [2023-01-28 15:09:29] INFO: Extending reads [2023-01-28 15:09:29] DEBUG: Estimating overlap coverage [2023-01-28 15:10:44] INFO: Overlap-based coverage: 5 [2023-01-28 15:10:44] INFO: Median overlap divergence: 0.193695 [2023-01-28 15:10:44] DEBUG: Sequence divergence distribution ........... [2023-01-28 20:03:23] root: INFO: Assembly statistics: