mikolmogorov / Flye

De novo assembler for single molecule sequencing reads using repeat graphs
Other
763 stars 165 forks source link

Error "Segmentation fault" #305

Closed biowackysci closed 3 years ago

biowackysci commented 3 years ago

I am trying to assemble a 2.8g plant genome genome with ONT sequecing data. I have been getting a segmentation fault when i try the following

python bin/flye --nano-raw /group/pasture/Saila/cleanfasta/fixed_cleaned_40kbabove_HSandSVreads.fasta --out-dir /group/pasture/Saila/Flye/Flye/cleaned_40kb_fasta_HSandSV_outcome/ --genome-size 2.8g --threads 64

with a 1000G mem in slurm

The error log file is [2020-09-17 13:26:45] root: INFO: Starting Flye 2.7-b1587 [2020-09-17 13:26:45] root: DEBUG: Cmd: bin/flye --nano-raw /group/pasture/Saila/cleanfasta/fixed_cleaned_40kbabove_HSandSVreads.fasta --out-dir /group/pasture/Saila/Flye/Flye/cleaned_40kb_fasta_HSandSV_outcome/ --genome-size 2.8g --threads 64 [2020-09-17 13:26:45] root: DEBUG: Python version: 2.7.5 (default, Jun 20 2019, 20:27:34) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] [2020-09-17 13:26:45] root: INFO: >>>STAGE: configure [2020-09-17 13:26:45] root: INFO: Configuring run [2020-09-17 13:28:32] root: WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs [2020-09-17 13:38:45] root: INFO: Total read length: 122599120798 [2020-09-17 13:38:45] root: INFO: Input genome size: 2800000000 [2020-09-17 13:38:45] root: INFO: Estimated coverage: 43 [2020-09-17 13:38:45] root: INFO: Reads N50/N90: 21593227754 / 10553578087 [2020-09-17 13:38:45] root: INFO: Minimum overlap set to 5000 [2020-09-17 13:38:45] root: INFO: Selected k-mer size: 17 [2020-09-17 13:38:45] root: INFO: >>>STAGE: assembly [2020-09-17 13:38:45] root: INFO: Assembling disjointigs [2020-09-17 13:38:45] root: DEBUG: -----Begin assembly log------ [2020-09-17 13:38:45] root: DEBUG: Running: flye-modules assemble --reads /group/pasture/Saila/cleanfasta/fixed_cleaned_40kbabove_HSandSVreads.fasta --out-asm /group/pasture/Saila/Flye/Flye/cleaned_40kb_fasta_HSandSV_outcome/00-assembly/draft_assembly.fasta --genome-size 2800000000 --config /group/pasture/Saila/Flye/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /group/pasture/Saila/Flye/Flye/cleaned_40kb_fasta_HSandSV_outcome/flye.log --threads 64 --min-ovlp 5000 --kmer 17 [2020-09-17 13:38:47] DEBUG: Build date: Apr 23 2020 11:10:00 [2020-09-17 13:38:47] DEBUG: Total RAM: 1510 Gb [2020-09-17 13:38:47] DEBUG: Available RAM: 1428 Gb [2020-09-17 13:38:47] DEBUG: Total CPUs: 48 [2020-09-17 13:38:47] DEBUG: Loading /group/pasture/Saila/Flye/Flye/flye/config/bin_cfg/asm_raw_reads.cfg [2020-09-17 13:38:47] DEBUG: Loading /group/pasture/Saila/Flye/Flye/flye/config/bin_cfg/asm_defaults.cfg [2020-09-17 13:38:47] DEBUG: big_genome_threshold=29000000 [2020-09-17 13:38:47] DEBUG: max_coverage_drop_rate=5 [2020-09-17 13:38:47] DEBUG: chimera_window=100 [2020-09-17 13:38:47] DEBUG: min_reads_in_disjointig=4 [2020-09-17 13:38:47] DEBUG: max_inner_reads=10 [2020-09-17 13:38:47] DEBUG: max_inner_fraction=0.25 [2020-09-17 13:38:47] DEBUG: max_separation=500 [2020-09-17 13:38:47] DEBUG: unique_edge_length=50000 [2020-09-17 13:38:47] DEBUG: min_repeat_res_support=0.51 [2020-09-17 13:38:47] DEBUG: out_paths_ratio=5 [2020-09-17 13:38:47] DEBUG: graph_cov_drop_rate=5 [2020-09-17 13:38:47] DEBUG: coverage_estimate_window=100 [2020-09-17 13:38:47] DEBUG: max_bubble_length=50000 [2020-09-17 13:38:47] DEBUG: loop_coverage_rate=1.5 [2020-09-17 13:38:47] DEBUG: repeat_edge_cov_mult=1.75 [2020-09-17 13:38:47] DEBUG: weak_detach_rate=5 [2020-09-17 13:38:47] DEBUG: tip_coverage_rate=2 [2020-09-17 13:38:47] DEBUG: tip_length_rate=2 [2020-09-17 13:38:47] DEBUG: low_cutoff_warning=1 [2020-09-17 13:38:47] DEBUG: hard_min_coverage_rate=10 [2020-09-17 13:38:47] DEBUG: assemble_kmer_sample=1 [2020-09-17 13:38:47] DEBUG: repeat_graph_kmer_sample=1 [2020-09-17 13:38:47] DEBUG: read_align_kmer_sample=1 [2020-09-17 13:38:47] DEBUG: meta_read_top_kmer_rate=0.25 [2020-09-17 13:38:47] DEBUG: meta_read_filter_kmer_freq=10 [2020-09-17 13:38:47] DEBUG: maximum_jump=1500 [2020-09-17 13:38:47] DEBUG: maximum_overhang=1500 [2020-09-17 13:38:47] DEBUG: repeat_kmer_rate=100 [2020-09-17 13:38:47] DEBUG: assemble_ovlp_relative_divergence=0.10 [2020-09-17 13:38:47] DEBUG: repeat_graph_ovlp_divergence=0.10 [2020-09-17 13:38:47] DEBUG: read_align_ovlp_divergence=0.25 [2020-09-17 13:38:47] DEBUG: add_unassembled_reads=0 [2020-09-17 13:38:47] DEBUG: extend_contigs_with_repeats=1 [2020-09-17 13:38:47] DEBUG: min_read_cov_cutoff=3 [2020-09-17 13:38:47] DEBUG: short_tip_length=20000 [2020-09-17 13:38:47] DEBUG: long_tip_length=100000 [2020-09-17 13:38:47] DEBUG: Running with k-mer size: 17 [2020-09-17 13:38:47] DEBUG: Running with minimum overlap 5000 [2020-09-17 13:38:47] DEBUG: Metagenome mode: N [2020-09-17 13:38:47] INFO: Reading sequences [2020-09-17 13:52:44] DEBUG: Building positional index [2020-09-17 13:52:48] DEBUG: Total sequence: 122599120798 bp [2020-09-17 13:52:48] DEBUG: Expected read coverage: 43 [2020-09-17 13:52:48] INFO: Generating solid k-mer index [2020-09-17 13:52:48] DEBUG: Hard threshold set to 4 [2020-09-17 13:52:48] DEBUG: Started k-mer counting [2020-09-17 13:53:04] INFO: Counting k-mers (1/2): [2020-09-17 14:40:43] INFO: Counting k-mers (2/2): [2020-09-17 18:51:44] DEBUG: Estimated minimum kmer coverage: 3 [2020-09-17 18:51:45] DEBUG: Filtered 3234675957 erroneous k-mers [2020-09-17 18:51:45] DEBUG: Repetitive k-mer frequency: 1292 [2020-09-17 18:51:45] DEBUG: Filtered 1511763 repetitive k-mers (0.000512349) [2020-09-17 18:51:45] INFO: Filling index table [2020-09-17 18:53:57] DEBUG: Sampling rate: 1 [2020-09-17 18:53:57] DEBUG: Solid k-mers: 2949138114 [2020-09-17 18:53:57] DEBUG: K-mer index size: 29237875078 [2020-09-17 18:53:57] DEBUG: Mean k-mer frequency: 9.91404 [2020-09-17 23:02:51] DEBUG: Sorting k-mer index [2020-09-17 23:14:12] DEBUG: Peak RAM usage: 284 Gb [2020-09-17 23:14:12] DEBUG: Estimating k-mer identity bias [2020-09-17 23:14:12] ERROR: Segmentation fault! Backtrace: [2020-09-17 23:14:12] ERROR: flye-modules(_Z15segfaultHandleri+0x1e) [0x44841e] [2020-09-17 23:14:12] ERROR: /lib64/libc.so.6(+0x36340) [0x7ff27661c340] [2020-09-17 23:14:12] ERROR: flye-modules(_ZNK15OverlapDetector14getSeqOverlapsERK11FastaRecordRbR12OvlpDivStatsi+0xa9b) [0x486efb] [2020-09-17 23:14:12] ERROR: flye-modules() [0x489da5] [2020-09-17 23:14:12] ERROR: flye-modules(_ZNSt6thread5_ImplISt12_Bind_simpleIFZ17processInParallelIN11FastaRecord2IdEEvRKSt6vectorIT_SaIS6_EESt8functionIFvRKS6_EEmbEUlvE_vEEE6_M_runEv+0x78) [0x45d7b8] [2020-09-17 23:14:12] ERROR: /lib64/libstdc++.so.6(+0xb5070) [0x7ff27719c070] [2020-09-17 23:14:12] ERROR: /lib64/libpthread.so.0(+0x7dd5) [0x7ff2769badd5] [2020-09-17 23:14:12] ERROR: /lib64/libc.so.6(clone+0x6d) [0x7ff2766e402d] [2020-09-17 23:18:49] root: ERROR: Command '['flye-modules', 'assemble', '--reads', '/group/pasture/Saila/cleanfasta/fixed_cleaned_40kbabove_HSandSVreads.fasta', '--out-asm', '/group/pasture/Saila/Flye/Flye/cleaned_40kb_fasta_HSandSV_outcome/00-assembly/draft_assembly.fasta', '--genome-size', '2800000000', '--config', '/group/pasture/Saila/Flye/Flye/flye/config/bin_cfg/asm_raw_reads.cfg', '--log', '/group/pasture/Saila/Flye/Flye/cleaned_40kb_fasta_HSandSV_outcome/flye.log', '--threads', '64', '--min-ovlp', '5000', '--kmer', '17']' returned non-zero exit status -6 [2020-09-17 23:18:49] root: ERROR: Pipeline aborted

Is there something I need to look into to get this pipeline going? Thanks S

mikolmogorov commented 3 years ago

Hi,

Looks like a bug, bot exactly sure where. Could you try the latest release (2.8) and see if it gives you the same error? If so, please attach the log from the new run as well.

Mikhail

biowackysci commented 3 years ago

Hello I tried to do the assembly with Flye2.8 and the pipeline aborted This is the log file [2020-10-03 13:05:44] INFO: Starting Flye 2.8.1-b1676 [2020-10-03 13:05:44] INFO: >>>STAGE: configure [2020-10-03 13:05:44] INFO: Configuring run [2020-10-03 13:07:30] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs [2020-10-03 13:17:36] INFO: Total read length: 122599120798 [2020-10-03 13:17:36] INFO: Input genome size: 2800000000 [2020-10-03 13:17:36] INFO: Estimated coverage: 43 [2020-10-03 13:17:36] INFO: Reads N50/N90: 21593227754 / 10553578087 [2020-10-03 13:17:36] INFO: Minimum overlap set to 5000 [2020-10-03 13:17:36] INFO: >>>STAGE: assembly [2020-10-03 13:17:36] INFO: Assembling disjointigs [2020-10-03 13:17:38] INFO: Reading sequences [2020-10-03 13:31:40] INFO: Counting k-mers: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2020-10-03 17:09:42] INFO: Filling index table (1/2) 0% 10% 20% 30% 40% [2020-10-03 17:09:42] ERROR: Caught unhandled exception: Cavector::reserve avector::reserve%% [2020-10-03 17:09:42] ERROR: flye-modules(_Z16exceptionHandlerv+0xd0) [0x44c930] [2020-10-03 17:09:42] ERROR: /lib64/libstdc++.so.6(+0x5e746) [0x7f12da370746] [2020-10-03 17:09:42] ERROR: /lib64/libstdc++.so.6(+0x5e773) [0x7f12da370773] [2020-10-03 17:09:42] ERROR: /lib64/libstdc++.so.6(+0xb5105) [0x7f12da3c7105] [2020-10-03 17:09:42] ERROR:-10-03 17:09:42] RROR: E RROR: / lib64/libpthread.so.0(+0x7dd5) [0x7f12d9be5dd5] flye-modules(_Z16exceptionHandlerv+0xd0) [0x44c930]flye-modules(_Z16exceptionHandlerv+0xd0) [0x44c930]

[2020-10-03 17:09:42] ERROR:-10-03 17:09:42] RROR: E [2020 /lib64/libc.so.6(clone lib64/libc.so.6(clone+0x6d) [0x7f12d990f02d] E/lib64 lib64/ lib64/libstdc++.so.6(+0x5e746) [0x7f12da370746] lib64/libstdc++.so.6(+0x5e746) [0x7f12da370746][2020-10-03 17:09:42]

[2020-10-03 17:10:55] ERROR: Command '['flye-modules', 'assemble', '--reads', '/group/pasture/Saila/cleanfasta/fixed_cleaned_40kbabove_HSandSVreads.fasta', '--out-asm', '/group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome/00-assembly/draft_assembly.fasta', '--config', '/group/pasture/Saila/Flye2.8/Flye/flye/config/bin_cfg/asm_raw_reads.cfg', '--log', '/group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome/flye.log', '--threads', '64', '--genome-size', '2800000000', '--min-ovlp', '5000']' returned non-zero exit status -6 [2020-10-03 17:10:55] ERROR: Pipeline aborted

Can you please look into this and suggest what would be a good thing to follow through? Thanks S

mikolmogorov commented 3 years ago

Hi,

This looks like an out-of-memory error to me. Are you using the same machine with 1.5 Tb of RAM (this should be more than enough). Maybe some other memory-intensive processes are interfering? Also, you using a cluster engine of some kind (e.g. slurm)? Finally, even if there is enough free RAM, some machine configuration might have hard requirement for RAM usge per process/user - so I would check if you have any policies like this in place.

biowackysci commented 3 years ago

Thanks let me try with increased memory and i will get back to you about the run. I have previously used Flye with other assemblies and they worked just fine Thanks S

biowackysci commented 3 years ago

Hello I tried with higher memory and looks like there is still this issue. Here is the log file : [2020-10-08 21:30:51] root: INFO: Starting Flye 2.8.1-b1676 [2020-10-08 21:30:51] root: DEBUG: Cmd: Flye/bin/flye --nano-raw /group/pasture/Saila/cleanfasta/fixed_cleaned_40kbabove_HSandSVreads.fasta --out-dir /group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome/ --genome-size 2.8g --threads 64 [2020-10-08 21:30:51] root: DEBUG: Python version: 2.7.5 (default, Apr 2 2020, 13:16:51) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] [2020-10-08 21:30:51] root: INFO: >>>STAGE: configure [2020-10-08 21:30:51] root: INFO: Configuring run [2020-10-08 21:32:35] root: WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs [2020-10-08 21:42:45] root: INFO: Total read length: 122599120798 [2020-10-08 21:42:45] root: INFO: Input genome size: 2800000000 [2020-10-08 21:42:45] root: INFO: Estimated coverage: 43 [2020-10-08 21:42:45] root: INFO: Reads N50/N90: 21593227754 / 10553578087 [2020-10-08 21:42:45] root: INFO: Minimum overlap set to 5000 [2020-10-08 21:42:45] root: INFO: >>>STAGE: assembly [2020-10-08 21:42:45] root: INFO: Assembling disjointigs [2020-10-08 21:42:45] root: DEBUG: -----Begin assembly log------ [2020-10-08 21:42:45] root: DEBUG: Running: flye-modules assemble --reads /group/pasture/Saila/cleanfasta/fixed_cleaned_40kbabove_HSandSVreads.fasta --out-asm /group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome/00-assembly/draft_assembly.fasta --config /group/pasture/Saila/Flye2.8/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome/flye.log --threads 64 --genome-size 2800000000 --min-ovlp 5000 [2020-10-08 21:42:46] DEBUG: Build date: Sep 30 2020 10:33:41 [2020-10-08 21:42:46] DEBUG: Total RAM: 1510 Gb [2020-10-08 21:42:46] DEBUG: Available RAM: 1442 Gb [2020-10-08 21:42:46] DEBUG: Total CPUs: 48 [2020-10-08 21:42:46] DEBUG: Loading /group/pasture/Saila/Flye2.8/Flye/flye/config/bin_cfg/asm_raw_reads.cfg [2020-10-08 21:42:46] DEBUG: Loading /group/pasture/Saila/Flye2.8/Flye/flye/config/bin_cfg/asm_defaults.cfg [2020-10-08 21:42:46] DEBUG: big_genome_threshold=29000000 [2020-10-08 21:42:46] DEBUG: meta_read_filter_kmer_freq=100 [2020-10-08 21:42:46] DEBUG: max_coverage_drop_rate=5 [2020-10-08 21:42:46] DEBUG: max_extensions_drop_rate=5 [2020-10-08 21:42:46] DEBUG: chimera_window=100 [2020-10-08 21:42:46] DEBUG: min_reads_in_disjointig=4 [2020-10-08 21:42:46] DEBUG: max_inner_reads=10 [2020-10-08 21:42:46] DEBUG: max_inner_fraction=0.25 [2020-10-08 21:42:46] DEBUG: max_separation=500 [2020-10-08 21:42:46] DEBUG: unique_edge_length=50000 [2020-10-08 21:42:46] DEBUG: min_repeat_res_support=0.51 [2020-10-08 21:42:46] DEBUG: out_paths_ratio=5 [2020-10-08 21:42:46] DEBUG: graph_cov_drop_rate=5 [2020-10-08 21:42:46] DEBUG: coverage_estimate_window=100 [2020-10-08 21:42:46] DEBUG: max_bubble_length=50000 [2020-10-08 21:42:46] DEBUG: loop_coverage_rate=1.5 [2020-10-08 21:42:46] DEBUG: repeat_edge_cov_mult=1.75 [2020-10-08 21:42:46] DEBUG: weak_detach_rate=5 [2020-10-08 21:42:46] DEBUG: tip_coverage_rate=2 [2020-10-08 21:42:46] DEBUG: tip_length_rate=2 [2020-10-08 21:42:46] DEBUG: low_cutoff_warning=1 [2020-10-08 21:42:46] DEBUG: hard_min_coverage_rate=10 [2020-10-08 21:42:46] DEBUG: kmer_size=17 [2020-10-08 21:42:46] DEBUG: use_minimizers=0 [2020-10-08 21:42:46] DEBUG: reads_base_alignment=0 [2020-10-08 21:42:46] DEBUG: assemble_kmer_sample=1 [2020-10-08 21:42:46] DEBUG: repeat_graph_kmer_sample=1 [2020-10-08 21:42:46] DEBUG: read_align_kmer_sample=1 [2020-10-08 21:42:46] DEBUG: meta_read_top_kmer_rate=0.40 [2020-10-08 21:42:46] DEBUG: maximum_jump=1500 [2020-10-08 21:42:46] DEBUG: maximum_overhang=1500 [2020-10-08 21:42:46] DEBUG: repeat_kmer_rate=100 [2020-10-08 21:42:46] DEBUG: assemble_ovlp_divergence=0.10 [2020-10-08 21:42:46] DEBUG: assemble_divergence_relative=1 [2020-10-08 21:42:46] DEBUG: repeat_graph_ovlp_divergence=0.10 [2020-10-08 21:42:46] DEBUG: read_align_ovlp_divergence=0.25 [2020-10-08 21:42:46] DEBUG: hpc_scoring_on=0 [2020-10-08 21:42:46] DEBUG: add_unassembled_reads=0 [2020-10-08 21:42:46] DEBUG: extend_contigs_with_repeats=0 [2020-10-08 21:42:46] DEBUG: min_read_cov_cutoff=3 [2020-10-08 21:42:46] DEBUG: short_tip_length=20000 [2020-10-08 21:42:46] DEBUG: long_tip_length=100000 [2020-10-08 21:42:46] DEBUG: Running with k-mer size: 17 [2020-10-08 21:42:46] DEBUG: Running with minimum overlap 5000 [2020-10-08 21:42:46] DEBUG: Metagenome mode: N [2020-10-08 21:42:46] INFO: Reading sequences [2020-10-08 21:56:33] DEBUG: Building positional index [2020-10-08 21:56:36] DEBUG: Total sequence: 122599120798 bp [2020-10-08 21:56:41] INFO: Counting k-mers: [2020-10-09 01:36:07] DEBUG: Updating k-mer histogram [2020-10-09 01:42:55] DEBUG: Hash size: 307087488 [2020-10-09 01:42:55] DEBUG: Total k-mers 6233394128 [2020-10-09 01:43:10] INFO: Filling index table (1/2) [2020-10-09 01:43:10] ERROR: Caught unhandled exception: vector::reserve [2020-10-09 01:43:10] ERROR: flye-modules(_Z16exceptionHandlerv+0xd0) [0x44c930] [2020-10-09 01:43:10] ERROR: /lib64/libstdc++.so.6(+0x5e746) [0x7f7fcd54f746] [2020-10-09 01:43:10] ERROR: [2020-10-09 01:43:10] ERROR: /lib64/libstdc++.so.6(+0x5e773) [0x7f7fcd54f773] [2020-10-09 01:43:10] ERROR: flye-modules(_Z16exceptionHandlerv+0xd0) [0x44c930] [2020-10-09 01:43:10] ERROR: /lib64/libstdc++.so.6(+0xb5105) [0x7f7fcd5a6105] [2020-10-09 01:43:10] ERROR: /lib64/libstdc++.so.6(+0x5e746) [0x7f7fcd54f746] [2020-10-09 01:43:10] ERROR: /lib64/libpthread.so.0(+0x7ea5) [0x7f7fccdc4ea5][2020-10-09 01:43:10] ERROR: [2020-10-09 01:43:10] ERROR: /lib64/libstdc++.so.6(+0x5e773) [0x7f7fcd54f773]flye-modules(_Z16exceptionHandlerv+0xd0) [0x44c930]

[2020-10-09 01:43:10] ERROR: [2020-10-09 01:44:24] root: ERROR: Command '['flye-modules', 'assemble', '--reads', '/group/pasture/Saila/cleanfasta/fixed_cleaned_40kbabove_HSandSVreads.fasta', '--out-asm', '/group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome/00-assembly/draft_assembly.fasta', '--config', '/group/pasture/Saila/Flye2.8/Flye/flye/config/bin_cfg/asm_raw_reads.cfg', '--log', '/group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome/flye.log', '--threads', '64', '--genome-size', '2800000000', '--min-ovlp', '5000']' returned non-zero exit status -6 [2020-10-09 01:44:24] root: ERROR: Pipeline aborted

Now not sure what to do S

mikolmogorov commented 3 years ago

Hi,

I've noticed that the log reports reads with N50/ N90 = 21593227754 / 10553578087. This can't be true for ONT reads. It is likely that something is wrong with the read file format, which eventually causes the error.

Mikhail

biowackysci commented 3 years ago

Hello,

The read lengths are true as they are PromethION reads and filtered for extremely long reads. We usually get reads that are of that quality. But as you mentioned I will check the file format and give it a try. I had done a similar exercise with the same reads but a different filtering for long reads and Flye worked just fine.

S

mikolmogorov commented 3 years ago

@biowackysci I have not heard about PromethION reads longer than 10 Gb yet. Something must be wrong.

biowackysci commented 3 years ago

Hello again, I checked the file format and made changes and ran Flye again and now it aborted with a error in command. I have uploaded the logfile below:

[2020-10-16 11:18:44] root: INFO: Starting Flye 2.8.1-b1676 [2020-10-16 11:18:44] root: DEBUG: Cmd: Flye/bin/flye --nano-raw /group/pasture/Saila/Hiroshi_reads/out_nodupli_cleaned_40kbabove_HSandSVallreadscombine.fasta --out-dir /group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome_new/ --genome-size 2.8g --threads 64 [2020-10-16 11:18:44] root: DEBUG: Python version: 2.7.5 (default, Apr 2 2020, 13:16:51) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] [2020-10-16 11:18:44] root: INFO: >>>STAGE: configure [2020-10-16 11:18:44] root: INFO: Configuring run [2020-10-16 11:19:54] root: WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs [2020-10-16 11:22:24] root: INFO: Total read length: 48607860504 [2020-10-16 11:22:24] root: INFO: Input genome size: 2800000000 [2020-10-16 11:22:24] root: INFO: Estimated coverage: 17 [2020-10-16 11:22:24] root: INFO: Reads N50/N90: 65335 / 42840 [2020-10-16 11:22:24] root: INFO: Minimum overlap set to 5000 [2020-10-16 11:22:24] root: INFO: >>>STAGE: assembly [2020-10-16 11:22:24] root: INFO: Assembling disjointigs [2020-10-16 11:22:24] root: DEBUG: -----Begin assembly log------ [2020-10-16 11:22:24] root: DEBUG: Running: flye-modules assemble --reads /group/pasture/Saila/Hiroshi_reads/out_nodupli_cleaned_40kbabove_HSandSVallreadscombine.fasta --out-asm /group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome_new/00-assembly/draft_assembly.fasta --config /group/pasture/Saila/Flye2.8/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome_new/flye.log --threads 64 --genome-size 2800000000 --min-ovlp 5000 [2020-10-16 11:22:24] DEBUG: Build date: Sep 30 2020 10:33:41 [2020-10-16 11:22:24] DEBUG: Total RAM: 754 Gb [2020-10-16 11:22:24] DEBUG: Available RAM: 721 Gb [2020-10-16 11:22:24] DEBUG: Total CPUs: 48 [2020-10-16 11:22:24] DEBUG: Loading /group/pasture/Saila/Flye2.8/Flye/flye/config/bin_cfg/asm_raw_reads.cfg [2020-10-16 11:22:24] DEBUG: Loading /group/pasture/Saila/Flye2.8/Flye/flye/config/bin_cfg/asm_defaults.cfg [2020-10-16 11:22:24] DEBUG: big_genome_threshold=29000000 [2020-10-16 11:22:24] DEBUG: meta_read_filter_kmer_freq=100 [2020-10-16 11:22:24] DEBUG: max_coverage_drop_rate=5 [2020-10-16 11:22:24] DEBUG: max_extensions_drop_rate=5 [2020-10-16 11:22:24] DEBUG: chimera_window=100 [2020-10-16 11:22:24] DEBUG: min_reads_in_disjointig=4 [2020-10-16 11:22:24] DEBUG: max_inner_reads=10 [2020-10-16 11:22:24] DEBUG: max_inner_fraction=0.25 [2020-10-16 11:22:24] DEBUG: max_separation=500 [2020-10-16 11:22:24] DEBUG: unique_edge_length=50000 [2020-10-16 11:22:24] DEBUG: min_repeat_res_support=0.51 [2020-10-16 11:22:24] DEBUG: out_paths_ratio=5 [2020-10-16 11:22:24] DEBUG: graph_cov_drop_rate=5 [2020-10-16 11:22:24] DEBUG: coverage_estimate_window=100 [2020-10-16 11:22:24] DEBUG: max_bubble_length=50000 [2020-10-16 11:22:24] DEBUG: loop_coverage_rate=1.5 [2020-10-16 11:22:24] DEBUG: repeat_edge_cov_mult=1.75 [2020-10-16 11:22:24] DEBUG: weak_detach_rate=5 [2020-10-16 11:22:24] DEBUG: tip_coverage_rate=2 [2020-10-16 11:22:24] DEBUG: tip_length_rate=2 [2020-10-16 11:22:24] DEBUG: low_cutoff_warning=1 [2020-10-16 11:22:24] DEBUG: hard_min_coverage_rate=10 [2020-10-16 11:22:24] DEBUG: kmer_size=17 [2020-10-16 11:22:24] DEBUG: use_minimizers=0 [2020-10-16 11:22:24] DEBUG: reads_base_alignment=0 [2020-10-16 11:22:24] DEBUG: assemble_kmer_sample=1 [2020-10-16 11:22:24] DEBUG: repeat_graph_kmer_sample=1 [2020-10-16 11:22:24] DEBUG: read_align_kmer_sample=1 [2020-10-16 11:22:24] DEBUG: meta_read_top_kmer_rate=0.40 [2020-10-16 11:22:24] DEBUG: maximum_jump=1500 [2020-10-16 11:22:24] DEBUG: maximum_overhang=1500 [2020-10-16 11:22:24] DEBUG: repeat_kmer_rate=100 [2020-10-16 11:22:24] DEBUG: assemble_ovlp_divergence=0.10 [2020-10-16 11:22:24] DEBUG: assemble_divergence_relative=1 [2020-10-16 11:22:24] DEBUG: repeat_graph_ovlp_divergence=0.10 [2020-10-16 11:22:24] DEBUG: read_align_ovlp_divergence=0.25 [2020-10-16 11:22:24] DEBUG: hpc_scoring_on=0 [2020-10-16 11:22:24] DEBUG: add_unassembled_reads=0 [2020-10-16 11:22:24] DEBUG: extend_contigs_with_repeats=0 [2020-10-16 11:22:24] DEBUG: min_read_cov_cutoff=3 [2020-10-16 11:22:24] DEBUG: short_tip_length=20000 [2020-10-16 11:22:24] DEBUG: long_tip_length=100000 [2020-10-16 11:22:24] DEBUG: Running with k-mer size: 17 [2020-10-16 11:22:24] DEBUG: Running with minimum overlap 5000 [2020-10-16 11:22:24] DEBUG: Metagenome mode: N [2020-10-16 11:22:24] INFO: Reading sequences [2020-10-16 11:28:22] DEBUG: Building positional index [2020-10-16 11:28:23] DEBUG: Total sequence: 48607860504 bp [2020-10-16 11:28:28] INFO: Counting k-mers: [2020-10-16 13:22:31] DEBUG: Updating k-mer histogram [2020-10-16 13:28:04] DEBUG: Hash size: 172187083 [2020-10-16 13:28:04] DEBUG: Total k-mers 5339225455 [2020-10-16 13:28:12] INFO: Filling index table (1/2) [2020-10-16 14:14:49] ERROR: Caught unhandled exception: vector::reserve [2020-10-16 14:14:49] ERROR: flye-modules(_Z16exceptionHandlerv+0xd0) [0x44c930] [2020-10-16 14:14:49] ERROR: /lib64/libstdc++.so.6(+0x5e746) [0x7f89da9ac746] [2020-10-16 14:14:49] ERROR: /lib64/libstdc++.so.6(+0x5e773) [0x7f89da9ac773] [2020-10-16 14:14:49] ERROR: /lib64/libstdc++.so.6(+0xb5105) [0x7f89daa03105] [2020-10-16 14:14:49] ERROR: /lib64/libpthread.so.0(+0x7ea5) [0x7f89da221ea5] [2020-10-16 14:14:49] ERROR: /lib64/libc.so.6(clone+0x6d) [0x7f89d9f4a8dd] [2020-10-16 14:15:58] root: ERROR: Command '['flye-modules', 'assemble', '--reads', '/group/pasture/Saila/Hiroshi_reads/out_nodupli_cleaned_40kbabove_HSandSVallreadscombine.fasta', '--out-asm', '/group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome_new/00-assembly/draft_assembly.fasta', '--config', '/group/pasture/Saila/Flye2.8/Flye/flye/config/bin_cfg/asm_raw_reads.cfg', '--log', '/group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome_new/flye.log', '--threads', '64', '--genome-size', '2800000000', '--min-ovlp', '5000']' returned non-zero exit status -6 [2020-10-16 14:15:58] root: ERROR: Pipeline aborted

Can you please suggest your opinion? Thanks S

mikolmogorov commented 3 years ago

@biowackysci very likely there is still at least one read of abnormal length that makes vector::reserve() call to fail, as it attempts to allocate a contiguous chunk of memory equal to read length. I suggest to look at longest reads in your fasta file and see if that's the case.

biowackysci commented 3 years ago

Hello @fenderglass I corrected the reads and it stopped at the contig stage and has different error now here is the log file [2020-10-21 19:20:05] INFO: Starting Flye 2.8.1-b1676 [2020-10-21 19:20:05] INFO: >>>STAGE: configure [2020-10-21 19:20:05] INFO: Configuring run [2020-10-21 19:21:17] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs [2020-10-21 19:25:05] INFO: Total read length: 56829509353 [2020-10-21 19:25:05] INFO: Input genome size: 2800000000 [2020-10-21 19:25:05] INFO: Estimated coverage: 20 [2020-10-21 19:25:05] INFO: Reads N50/N90: 54039 / 42227 [2020-10-21 19:25:05] INFO: Minimum overlap set to 5000 [2020-10-21 19:25:05] INFO: >>>STAGE: assembly [2020-10-21 19:25:05] INFO: Assembling disjointigs [2020-10-21 19:25:05] INFO: Reading sequences [2020-10-21 19:34:26] INFO: Counting k-mers: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2020-10-21 20:48:15] INFO: Filling index table (1/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2020-10-21 22:46:12] INFO: Filling index table (2/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2020-10-22 00:42:40] INFO: Extending reads [2020-10-22 01:32:26] INFO: Overlap-based coverage: 5 [2020-10-22 01:32:26] INFO: Median overlap divergence: 0.179709 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2020-10-26 09:53:18] INFO: Assembled 16725 disjointigs [2020-10-26 09:53:58] INFO: Generating sequence 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2020-10-26 10:08:50] INFO: >>>STAGE: consensus [2020-10-26 10:10:03] INFO: Running Minimap2 [2020-10-26 20:10:27] INFO: Computing consensus [2020-10-26 22:28:54] INFO: Alignment error rate: 0.198925 [2020-10-26 22:29:27] INFO: >>>STAGE: repeat [2020-10-26 22:29:28] INFO: Building and resolving repeat graph [2020-10-26 22:29:29] INFO: Parsing disjointigs [2020-10-26 22:30:09] INFO: Building repeat graph 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2020-10-27 15:08:35] INFO: Median overlap divergence: 0.112086 [2020-10-27 16:14:48] INFO: Parsing reads [2020-10-27 16:20:36] INFO: Aligning reads to the graph 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2020-10-28 04:31:25] INFO: Aligned read sequence: 52888213194 / 56829509353 (0.930647) [2020-10-28 04:31:25] INFO: Median overlap divergence: 0.0958366 [2020-10-28 04:32:48] INFO: Mean edge coverage: 32 [2020-10-28 04:32:52] INFO: Simplifying the graph [2020-10-28 09:09:58] INFO: >>>STAGE: contigger [2020-10-28 09:09:58] INFO: Generating contigs [2020-10-28 09:09:59] INFO: Reading sequences [2020-10-28 09:24:54] INFO: Generated 49729 contigs [2020-10-28 09:26:33] INFO: Added 105 scaffold connections [2020-10-28 14:04:40] ERROR: Command '['flye-modules', 'contigger', '--graph-edges', '/group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome_new/20-repeat/repeat_graph_edges.fasta', '--reads', '/group/pasture/Saila/Flye/40kbabovereads/nodupli_fixed_cleaned_HSandSV_40kbabovereads.fasta', '--out-dir', '/group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome_new/30-contigger', '--config', '/group/pasture/Saila/Flye2.8/Flye/flye/config/bin_cfg/asm_raw_reads.cfg', '--repeat-graph', '/group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome_new/20-repeat/repeat_graph_dump', '--graph-aln', '/group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome_new/20-repeat/read_alignment_dump', '--log', '/group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome_new/flye.log', '--threads', '64', '--min-ovlp', '5000']' returned non-zero exit status -7 [2020-10-28 14:04:40] ERROR: Pipeline aborted

Can please suggest what I can do to improve this ? Thanks S

mikolmogorov commented 3 years ago

@biowackysci looks like the dumping the final graph failed for some reason. But in fact, all the important files should have already been generated. You can try to do touch flye_dir/30-contigger/graph_final.gfa, and then use the original flye command line with --resume-from polishing to restart the pipeline from the polishing stage.

biowackysci commented 3 years ago

Thanks for that I tried to restart the run and now have this in the log file [2020-10-30 13:49:10] INFO: Starting Flye 2.8.1-b1676 [2020-10-30 13:49:10] INFO: Resuming previous run [2020-10-30 13:49:10] INFO: >>>STAGE: polishing [2020-10-30 13:49:10] INFO: Polishing genome (1/1) [2020-10-30 13:49:54] INFO: Running minimap2 [2020-10-30 18:24:00] INFO: Separating alignment into bubbles [2020-10-30 18:55:44] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs [2020-10-30 19:31:42] INFO: Alignment error rate: 0.166848 [2020-10-30 19:31:42] INFO: Correcting bubbles 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Traceback (most recent call last): File "Flye/bin/flye", line 25, in sys.exit(main()) File "/group/pasture/Saila/Flye2.8/Flye/flye/main.py", line 792, in main _run(args) File "/group/pasture/Saila/Flye2.8/Flye/flye/main.py", line 571, in _run jobs[i].run() File "/group/pasture/Saila/Flye2.8/Flye/flye/main.py", line 371, in run self.args.threads) File "/group/pasture/Saila/Flye2.8/Flye/flye/polishing/polish.py", line 201, in generate_polished_edges .format(seq_id, edges_dict[seq_id], coverage_tag)) KeyError: 'edge_60632'

Thanks S

mikolmogorov commented 3 years ago

@biowackysci I see - this is happening because we used a workaround on the previous assembly. Because the assembly graph was incomplete, Flye was not able to generate the polished version of the graph.

But your polished contigs are ready in 40-polishing/filtered_contigs.fasta - feel free to use it as a final assmebly. The issue with slow gfa generation should be fixed in the next Flye release: https://github.com/fenderglass/Flye/issues/290

mikolmogorov commented 3 years ago

I'm assuming that the issue is now resolved. Feel free to post here is you have any follow-ups.

biowackysci commented 3 years ago

Hello Fenderglass Sorry to bother you again. The draft assembly that was generated in this assembly as all the contigs as disjoints. For example

disjointig_1 AGTATGCTTCAGTTGGTTACATGTACTAGAGGAACCAGGAGTATCTAAAGGCTTACCCGAGGAAGAGACCTTTGGAGATG AGGCAACCCCCATCGGTTCTTCCATGCCCATGCTTGGAAGGACTCCCTTAGCCATTAGCAAAACCTCCTCCGTAGTACCT CTCTTGGGCTGACCCAAAGAACCATATCCACTCTGCATTACATTTAATTTCAGCATTATACTGACTTCATCGAGACAAAA GCATTGCGATCAAATGGCTGCGAAAACCATAATTTAAATCATACAATAATATTAGCATTACTATCACGTGAGACTCGAGG ACCACGTGATGGGAGACCACAGCCGATCAAGCAGTCTGGCCCCGCAGCCAATTGGCGTCGAGACAAACCTCTTGACGTAC GCTGCCCGGTATTTGTTCCCTGTGTTGAATCCTCACGTAGCCCATGACGAACCGCCCATGATGTGAAACACCTCTCCTCG GCACAAGAGATCCAGACCCGCATCTCTGCTCCCGCATCACGGCGCTTTTTTATGACAGGCGGGCACGTGAACTGCCCGGC CCGGCGTAGCGTCGCCAGGGACGGCGGCAGTATTACAAATTTGTTGCCGGCCTGGAGGCGGCTTCCCATTAGCGCATGCC GGGCTGGCGTAGCGGCGGGCAGTTCACGCCCCGTGGCAGAGTTAGAGCCATCCGCACGATCCCAAGGCCCAACAAACCAG

Now I am not sure if the assembly is complete or do you suggest that I rerun this ? Thanks S

mikolmogorov commented 3 years ago

@biowackysci not sure I understand what that problem is. Could you provide more details?

biowackysci commented 3 years ago

In the /group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome_new/00-assembly folder, the draft_assembly is about 5.517 Gbps with the contigs being named as disjointigs. An example of heading the file is disjointig_1 AGTATGCTTCAGTTGGTTACATGTACTAGAGGAACCAGGAGTATCTAAAGGCTTACCCGAGGAAGAGACCTTTGGAGATG AGGCAACCCCCATCGGTTCTTCCATGCCCATGCTTGGAAGGACTCCCTTAGCCATTAGCAAAACCTCCTCCGTAGTACCT CTCTTGGGCTGACCCAAAGAACCATATCCACTCTGCATTACATTTAATTTCAGCATTATACTGACTTCATCGAGACAAAA GCATTGCGATCAAATGGCTGCGAAAACCATAATTTAAATCATACAATAATATTAGCATTACTATCACGTGAGACTCGAGG ACCACGTGATGGGAGACCACAGCCGATCAAGCAGTCTGGCCCCGCAGCCAATTGGCGTCGAGACAAACCTCTTGACGTAC GCTGCCCGGTATTTGTTCCCTGTGTTGAATCCTCACGTAGCCCATGACGAACCGCCCATGATGTGAAACACCTCTCCTCG GCACAAGAGATCCAGACCCGCATCTCTGCTCCCGCATCACGGCGCTTTTTTATGACAGGCGGGCACGTGAACTGCCCGGC CCGGCGTAGCGTCGCCAGGGACGGCGGCAGTATTACAAATTTGTTGCCGGCCTGGAGGCGGCTTCCCATTAGCGCATGCC GGGCTGGCGTAGCGGCGGGCAGTTCACGCCCCGTGGCAGAGTTAGAGCCATCCGCACGATCCCAAGGCCCAACAAACCAG

I was wondering if the assembly has not proceeded to make contigs out of them as /group/pasture/Saila/Flye2.8/cleaned_40kb_fasta_HSandSV_outcome_new/40-polishing/filtered_contigs.fasta has a size of 3.3 Gbps and it is not very close to the draft_assembly size.

I was thinking the assembling has stopped somewhere and not proceeded to make the contigs and therefore this mismatch of size? I was wondering if I should rerun this or is there a way to make the disjointigs into contigs ? Thanks S

mikolmogorov commented 3 years ago

@biowackysci it is not uncommon to see that final assembly is shorter than initial disjointigs set. It often happens for highly repetitive/heterozygous genomes, as generated disjointigs will contain a lot of duplicated sequence.

biowackysci commented 3 years ago

Thanks for that, Also was also wondering if I should use the draft assembly with just the disjointigs in them as I dont see any contigs formed in them. Is there a way I could get contigs from them ?

Thanks S

mikolmogorov commented 3 years ago

@biowackysci currently, no easy way to do this. Disjointigs may contain misassemblies and duplications, therefore should not be used for the downstream analysis. If you have a good understanding of the methods, you certainly can try to recover the sequence manually, but I don't have any advice here.

felixheno commented 1 year ago

Hello @fenderglass,

I get a segmentation fault error in the polishing step of my Flye run, and I am not sure what the exact reason is and how to solve it, and if Flye managed to produce an assembly that is appropriate for downstream analysis. Would you be able to help me?

The Flye log file:

cn-3 /mnt/project/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts Overlap: 7000 2.9.2-b1786 [2023-08-16 13:38:16] INFO: Starting Flye 2.9.2-b1786 [2023-08-16 13:38:16] INFO: >>>STAGE: configure [2023-08-16 13:38:16] INFO: Configuring run [2023-08-16 15:19:22] INFO: Total read length: 106803122689 [2023-08-16 15:19:22] INFO: Reads N50/N90: 40424 / 14766 [2023-08-16 15:19:22] INFO: Selected minimum overlap: 7000 [2023-08-16 15:19:23] INFO: >>>STAGE: assembly [2023-08-16 15:19:23] INFO: Assembling disjointigs [2023-08-16 15:19:23] INFO: Reading sequences [2023-08-16 16:28:04] INFO: Building minimizer index [2023-08-16 16:28:05] INFO: Pre-calculating index storage 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2023-08-16 16:55:21] INFO: Filling index 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2023-08-16 17:26:30] INFO: Extending reads [2023-08-16 17:48:47] INFO: Overlap-based coverage: 41 [2023-08-16 17:48:47] INFO: Median overlap divergence: 0.096496 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2023-08-20 10:24:51] INFO: Assembled 2186 disjointigs [2023-08-20 10:26:19] INFO: Generating sequence 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2023-08-20 10:35:23] INFO: Filtering contained disjointigs 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2023-08-20 11:17:58] INFO: Contained seqs: 1081 [2023-08-20 11:48:05] INFO: >>>STAGE: consensus [2023-08-20 11:48:24] INFO: Running Minimap2 [2023-08-21 15:07:18] INFO: Computing consensus [2023-08-21 18:01:08] INFO: Alignment error rate: 0.108584 [2023-08-21 18:02:16] INFO: >>>STAGE: repeat [2023-08-21 18:02:16] INFO: Building and resolving repeat graph [2023-08-21 18:02:17] INFO: Parsing disjointigs [2023-08-21 18:03:59] INFO: Building repeat graph 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2023-08-22 09:50:50] INFO: Median overlap divergence: 0.122339 [2023-08-22 09:57:24] INFO: Parsing reads [2023-08-22 11:19:56] INFO: Aligning reads to the graph 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2023-08-22 18:59:50] INFO: Aligned read sequence: 98474417666 / 105014350544 (0.937723) [2023-08-22 18:59:51] INFO: Median overlap divergence: 0.0562028 [2023-08-22 19:00:07] INFO: Mean edge coverage: 50 [2023-08-22 19:00:15] INFO: Simplifying the graph [2023-08-22 19:15:00] INFO: >>>STAGE: contigger [2023-08-22 19:15:00] INFO: Generating contigs [2023-08-22 19:15:01] INFO: Reading sequences [2023-08-22 20:45:56] INFO: Generated 1161 contigs [2023-08-22 20:46:27] INFO: Added 36 scaffold connections [2023-08-22 20:48:39] INFO: >>>STAGE: polishing [2023-08-22 20:48:39] INFO: Polishing genome (1/1) [2023-08-22 20:48:39] INFO: Running minimap2 [2023-08-23 13:43:01] INFO: Separating alignment into bubbles [2023-08-23 18:12:03] INFO: Alignment error rate: 0.070841 [2023-08-23 18:12:03] INFO: Correcting bubbles 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2023-08-24 19:19:18] ERROR: Error running minimap2, terminating. See the alignment error log for details: /net/fs-2/scale/OrionStore/Projects/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts/flye-2.9_7000_filt/40-polishing/minimap.stderr [2023-08-24 19:19:18] ERROR: Cmd: flye-minimap2 '/net/fs-2/scale/OrionStore/Projects/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts/flye-2.9_7000_filt/40-polishing/filtered_contigs.fasta' '/net/fs-2/scale/OrionStore/Projects/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts/flye-2.9_7000_filt/30-contigger/graph_final.fasta' -x map-ont -t 40 -k 17 -a -p 0.5 -N 10 --sam-hit-only -L -K 5G -z 1000 -Q --secondary-seq -I 64G | flye-samtools view -T '/net/fs-2/scale/OrionStore/Projects/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts/flye-2.9_7000_filt/40-polishing/filtered_contigs.fasta' -u - | flye-samtools sort -T '/net/fs-2/scale/OrionStore/Projects/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts/flye-2.9_7000_filt/40-polishing/sort_230824_110422' -O bam -@ 4 -l 1 -m 4G -o '/net/fs-2/scale/OrionStore/Projects/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts/flye-2.9_7000_filt/40-polishing/edges_aln.bam' [2023-08-24 19:19:18] ERROR: Command '['/bin/bash', '-c', "set -eo pipefail; flye-minimap2 '/net/fs-2/scale/OrionStore/Projects/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts/flye-2.9_7000_filt/40-polishing/filtered_contigs.fasta' '/net/fs-2/scale/OrionStore/Projects/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts/flye-2.9_7000_filt/30-contigger/graph_final.fasta' -x map-ont -t 40 -k 17 -a -p 0.5 -N 10 --sam-hit-only -L -K 5G -z 1000 -Q --secondary-seq -I 64G | flye-samtools view -T '/net/fs-2/scale/OrionStore/Projects/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts/flye-2.9_7000_filt/40-polishing/filtered_contigs.fasta' -u - | flye-samtools sort -T '/net/fs-2/scale/OrionStore/Projects/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts/flye-2.9_7000_filt/40-polishing/sort_230824_110422' -O bam -@ 4 -l 1 -m 4G -o '/net/fs-2/scale/OrionStore/Projects/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts/flye-2.9_7000_filt/40-polishing/edges_aln.bam'"]' returned non-zero exit status 139. [2023-08-24 19:19:18] ERROR: Pipeline aborted

Here is the alignment error log file minimap.stderr referenced in the Flye log file:

[samfaipath] build FASTA index... [M::mm_idx_gen::171.2870.73] collected minimizers [M::mm_idx_gen::179.8681.35] sorted minimizers [M::main::179.8711.35] loaded/built the index for 288 target sequence(s) [M::mm_mapopt_update::187.3011.34] mid_occ = 1008 [M::mm_idx_stat] kmer size: 17; skip: 10; is_hpc: 0; #seq: 288 [M::mm_idx_stat::190.625*1.33] distinct minimizers: 127826739 (78.44% are singletons); average occurrences: 3.049; average spacing: 5.494; total length: 2141381420 /bin/bash: line 1: 26254 Segmentation fault (core dumped) flye-minimap2 '/net/fs-2/scale/OrionStore/Projects/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts/flye-2.9_7000_filt/40-polishing/filtered_contigs.fasta' '/net/fs-2/scale/OrionStore/Projects/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts/flye-2.9_7000_filt/30-contigger/graph_final.fasta' -x map-ont -t 40 -k 17 -a -p 0.5 -N 10 --sam-hit-only -L -K 5G -z 1000 -Q --secondary-seq -I 64G 26255 Done | flye-samtools view -T '/net/fs-2/scale/OrionStore/Projects/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts/flye-2.9_7000_filt/40-polishing/filtered_contigs.fasta' -u - 26256 Done | flye-samtools sort -T '/net/fs-2/scale/OrionStore/Projects/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts/flye-2.9_7000_filt/40-polishing/sort_230824_110422' -O bam -@ 4 -l 1 -m 4G -o '/net/fs-2/scale/OrionStore/Projects/FjellheimLab/ONT_grassgenomes/Vbro/assembly_scripts/flye-2.9_7000_filt/40-polishing/edges_aln.bam'

mikolmogorov commented 1 year ago

@felixheno for some reason the last minimap2 run crashed, possibly due to running out of memory. But you only need this last step if you want polished assembly graph. In this case, you can try rerunning with higher memory machine or less threads (e.g. 20), you can restart the polishing stage with --restart.

Otherwise, your polished contigs should already be in flye_output/40-polished/filtered_contigs.fasta.