Another error in processing mmseqs_contig_annotation

Hi, I have met with another ignoring error, here is the log:

0.982908 k-mers per position 14994 DB matches per sequence 8 overflows 0 queries produce too many hits (truncated result) 58 sequences passed prefiltering per query sequence 39 median result list length 2 sequences with 0 size result lists Time for merging to pref_0_tmp_0: 0h 0m 0s 53ms Time for merging to pref_0_tmp_0_tmp: 0h 0m 0s 142ms Process prefiltering step 2 of 3

Index table k-mer threshold: 0 at k-mer size 15 Index table: counting k-mers [=========================================================Invalid database read for database data file=hecatomb.out/processing/assembly/FLYE/mmseqs_nt_tmp/12858616815398277798/target_seqs_split, database index=hecatomb.out/processing/assembly/FLYE/mmseqs_nt_tmp/12858616815398277798/target_seqs_split.index Size of data: 9058457689 Requested offset: 9059357982 Error: Prefilter died Error: Search step died

I performed the analysis with hecatomb v1.3.2 reinstalled by pip. Could you please check this?

I checked the whole log and found that the error emerged from the former step, there were several error log files, and idx_ref.err and faidx_ref.err arised first. The snakemanke command is:

[Sat Jul  6 23:13:00 2024]
rule idx_ref:
    input: hecatomb.out/results/merged_assembly.fasta
    output: hecatomb.out/results/merged_assembly.fasta.idx
    log: hecatomb.out/logs/idx_ref.err
    jobid: 4
    benchmark: hecatomb.out/benchmarks/idx_ref.txt
    reason: Missing output files: hecatomb.out/results/merged_assembly.fasta.idx
    threads: 4
    resources: tmpdir=/tmp, mem_mb=16000, mem_mib=15259, mem=16000MB, time=02:00:00

awk 'BEGIN {count=-1} /^>/ { $0 = ">" ++count } 1' hecatomb.out/results/merged_assembly.fasta | minimap2 -t 4 -d hecatomb.out/results/merged_assembly.fasta.idx - 2> hecatomb.out/logs/idx_ref.err

[Sat Jul  6 23:13:00 2024]
rule sample_tsv:
    output: hecatomb.out/koverage.samples.tsv
    jobid: 62
    reason: Missing output files: hecatomb.out/koverage.samples.tsv; Params have changed since last execution
    resources: tmpdir=/tmp

[Sat Jul  6 23:13:00 2024]
rule faidx_ref:
    input: hecatomb.out/results/merged_assembly.fasta
    output: hecatomb.out/results/merged_assembly.fasta.fai
    log: hecatomb.out/logs/faidx_ref.err
    jobid: 5
    benchmark: hecatomb.out/benchmarks/faidx_ref.txt
    reason: Missing output files: hecatomb.out/results/merged_assembly.fasta.fai
    resources: tmpdir=/tmp

samtools faidx hecatomb.out/results/merged_assembly.fasta 2> hecatomb.out/logs/faidx_ref.err

And the logs are:

[M::mm_idx_gen::0.888*0.50] collected minimizers
[M::mm_idx_gen::1.044*0.68] sorted minimizers
[M::main::1.279*0.65] loaded/built the index for 6794 target sequence(s)
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 6794
[M::mm_idx_stat::1.333*0.64] distinct minimizers: 1276884 (81.50% are singletons); average occurrences: 1.414; average spacing: 5.421; total length: 9784718
[M::main] Version: 2.28-r1209
[M::main] CMD: minimap2 -t 4 -d hecatomb.out/results/merged_assembly.fasta.idx -
[M::main] Real time: 1.363 sec; CPU: 0.864 sec; Peak RSS: 0.081 GB

Actually, the merged_assembly.fasta is generated in the result file. The errors confused me for several days, looking forward yor reply.

shandley / hecatomb

Another error in processing mmseqs_contig_annotation #115