linzhi2013 / MitoZ

MitoZ: A toolkit for assembly, annotation, and visualization of animal mitochondrial genomes
https://doi.org/10.1093/nar/gkz173
GNU General Public License v3.0
117 stars 39 forks source link

Problems running Mitoz on test data #192

Closed marcosbarbeitos closed 1 year ago

marcosbarbeitos commented 1 year ago

Hello, I am testing Mitoz on an older Dell R900 server with 64GB of RAM and 16 cores running Scientific Linux ( = RHEL 7). After activating the conda environment, I am using the following command line:

mitoz all --fq1 test.1.fq.gz  --fq2 test.2.fq.gz --outprefix ttt --clade Arthropoda --data_size_for_mt_assembly 0 --assembler megahit --requiring_taxa Arthropoda --thread_number 8 --memory 50

However, I keep getting the following output, ending with an error message:

2023-05-09 19:12:27,149 - mitoz.utility.utility - INFO - 
Found: /mnt/Data/local/mitoz3.6/bin/spades.py

2023-05-09 19:12:27,150 - mitoz.utility.utility - INFO - 
Found: /mnt/Data/local/mitoz3.6/bin/megahit

2023-05-09 19:12:27,151 - mitoz.utility.utility - INFO - 
all.main() got args:
Namespace(assembler='megahit', clade='Arthropoda', data_size_for_mt_assembly='0', debug=False, fastq_read_length='150', filter_by_taxa=True, filter_other_para='', fq1='test.1.fq.gz', fq2='test.2.fq.gz', func=<function main at 0x7f5873d5d9d0>, genetic_code='auto', insert_size='250', kmers=[71], kmers_megahit=['43', '71', '99'], kmers_spades=['auto'], logger=<Logger mitoz.utility.utility (DEBUG)>, memory='50', min_abundance='10', outprefix='ttt', phred64=False, profiles_dir='/mnt/Data/local/mitoz3.6/lib/python3.8/site-packages/mitoz/profiles', requiring_relax='0', requiring_taxa='Arthropoda', resume_assembly=False, skip_filter=False, slow_search=False, species_name='Test sp.', template_sbt='/mnt/Data/local/mitoz3.6/lib/python3.8/site-packages/mitoz/annotate/script/template.sbt', thread_number='8', tmp_dir=None, workdir='./')

2023-05-09 19:12:27,151 - mitoz.utility.utility - INFO - 
all.main() got updated args:
Namespace(assembler='megahit', clade='Arthropoda', data_size_for_mt_assembly='0', debug=False, fastq_read_length='150', filter_by_taxa=True, filter_other_para='', fq1='/mnt/Data/local/mitoz3.6/test/test.1.fq.gz', fq2='/mnt/Data/local/mitoz3.6/test/test.2.fq.gz', func=<function main at 0x7f5873d5d9d0>, genetic_code='auto', insert_size='250', kmers=[71], kmers_megahit=['43', '71', '99'], kmers_spades=['auto'], logger=<Logger mitoz.utility.utility (DEBUG)>, memory='50', min_abundance='10', outprefix='ttt', phred64=False, profiles_dir='/mnt/Data/local/mitoz3.6/lib/python3.8/site-packages/mitoz/profiles', requiring_relax='0', requiring_taxa='Arthropoda', resume_assembly=False, skip_filter=False, slow_search=False, species_name='Test sp.', template_sbt='/mnt/Data/local/mitoz3.6/lib/python3.8/site-packages/mitoz/annotate/script/template.sbt', thread_number='8', tmp_dir=None, workdir='/mnt/Data/local/mitoz3.6/test')

2023-05-09 19:12:27,152 - mitoz.utility.utility - INFO - 
filter.main() get args: {'logger': <Logger mitoz.utility.utility (DEBUG)>, 'fq1': '/mnt/Data/local/mitoz3.6/test/test.1.fq.gz', 'fq2': '/mnt/Data/local/mitoz3.6/test/test.2.fq.gz', 'phred64': False, 'outprefix': 'ttt', 'fastq_read_length': 150, 'data_size_for_mt_assembly': '0', 'filter_other_para': '', 'thread_number': '8', 'workdir': '/mnt/Data/local/mitoz3.6/test/clean_data', 'workdir_done': '/mnt/Data/local/mitoz3.6/test/clean_data', 'workdir_log': '/mnt/Data/local/mitoz3.6/test/clean_data'}

2023-05-09 19:12:27,152 - mitoz.utility.utility - INFO - 
filter.main() get updated args: {'logger': <Logger mitoz.utility.utility (DEBUG)>, 'fq1': '/mnt/Data/local/mitoz3.6/test/test.1.fq.gz', 'fq2': '/mnt/Data/local/mitoz3.6/test/test.2.fq.gz', 'phred64': False, 'outprefix': 'ttt', 'fastq_read_length': 150, 'data_size_for_mt_assembly': '0', 'filter_other_para': '', 'thread_number': '8', 'workdir': '/mnt/Data/local/mitoz3.6/test/clean_data', 'workdir_done': '/mnt/Data/local/mitoz3.6/test/clean_data', 'workdir_log': '/mnt/Data/local/mitoz3.6/test/clean_data'}

2023-05-09 19:12:27,152 - mitoz.utility.utility - INFO - 
Not to subsample raw data!

2023-05-09 19:12:27,153 - mitoz.utility.utility - INFO - 

    # set -vex
    # set -o pipefail
    set -e

    mkdir -p /mnt/Data/local/mitoz3.6/test/clean_data
    mkdir -p /mnt/Data/local/mitoz3.6/test/clean_data
    mkdir -p /mnt/Data/local/mitoz3.6/test/clean_data
    cd /mnt/Data/local/mitoz3.6/test/clean_data

    task_id="FastpTask"
    done_file="/mnt/Data/local/mitoz3.6/test/clean_data/$task_id.done"
    log_o_file="/mnt/Data/local/mitoz3.6/test/clean_data/$task_id.o.log"
    log_e_file="/mnt/Data/local/mitoz3.6/test/clean_data/$task_id.e.log"

    if [ -f $done_file ]; then
        exit 0
    fi

    if [ -f "/mnt/Data/local/mitoz3.6/test/test.1.fq.gz" ] && [ -f "/mnt/Data/local/mitoz3.6/test/test.2.fq.gz" ];
    then
        fastp          -i /mnt/Data/local/mitoz3.6/test/test.1.fq.gz         -I /mnt/Data/local/mitoz3.6/test/test.2.fq.gz          --length_required 50         --reads_to_process 0         -o ttt.clean_R1.fq.gz         -O ttt.clean_R2.fq.gz         -R "ttt"         -j ttt.QC.json         -h ttt.QC.html         -w 8 && 1>$log_o_file 2>$log_e_file && touch $done_file   

    else
        fastp          -i /mnt/Data/local/mitoz3.6/test/test.1.fq.gz          --length_required 50         --reads_to_process 0         -o ttt.clean_R1.fq.gz         -R "ttt"         -j ttt.QC.json         -h ttt.QC.html         -w 8 && 1>$log_o_file 2>$log_e_file && touch $done_file

    fi

WARNNIG: different read numbers of the 6 pack
Read1 pack size: 245
Read2 pack size: 291

Read1 before filtering:
total reads: 12245
total bases: 1530625
Q20 bases: 1370267(89.5234%)
Q30 bases: 1267903(82.8356%)

Read2 before filtering:
total reads: 12245
total bases: 1530625
Q20 bases: 1402728(91.6441%)
Q30 bases: 1307982(85.4541%)

Read1 after filtering:
total reads: 11511
total bases: 1437918
Q20 bases: 1316232(91.5373%)
Q30 bases: 1227039(85.3344%)

Read2 after filtering:
total reads: 11511
total bases: 1437918
Q20 bases: 1330169(92.5066%)
Q30 bases: 1243651(86.4897%)

Filtering result:
reads passed filter: 23022
reads failed due to low quality: 1426
reads failed due to too many N: 34
reads failed due to too short: 8
reads with adapter trimmed: 60
bases trimmed due to adapters: 2610

Duplication rate: 0%

Insert size peak (evaluated by paired-end reads): 200

JSON report: ttt.QC.json
HTML report: ttt.QC.html

fastp -i /mnt/Data/local/mitoz3.6/test/test.1.fq.gz -I /mnt/Data/local/mitoz3.6/test/test.2.fq.gz --length_required 50 --reads_to_process 0 -o ttt.clean_R1.fq.gz -O ttt.clean_R2.fq.gz -R ttt -j ttt.QC.json -h ttt.QC.html -w 8 
fastp v0.23.2, time used: 2 seconds

2023-05-09 19:12:29,795 - mitoz.utility.utility - INFO - 

    sequencing: paired end (125 cycles + 125 cycles)

    before_filtering:
        total_reads: 24,490
        total_bases: 3,061,250

    after_filtering:
        total_reads: 23,022
        total_bases: 2,875,836

2023-05-09 19:12:29,796 - mitoz.utility.utility - INFO - 
filter.main() returns:
/mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R1.fq.gz
/mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R2.fq.gz

2023-05-09 19:12:29,797 - mitoz.utility.utility - INFO - 
assemble.main() got args:
{'logger': <Logger mitoz.utility.utility (DEBUG)>, 'workdir': '/mnt/Data/local/mitoz3.6/test/mt_assembly', 'outprefix': 'ttt', 'thread_number': '8', 'fq1': '/mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R1.fq.gz', 'fq2': '/mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R2.fq.gz', 'insert_size': '250', 'fastq_read_length': '150', 'assembler': 'megahit', 'tmp_dir': None, 'kmers': [71], 'kmers_megahit': ['43', '71', '99'], 'kmers_spades': ['auto'], 'memory': '50', 'resume_assembly': False, 'profiles_dir': '/mnt/Data/local/mitoz3.6/lib/python3.8/site-packages/mitoz/profiles', 'slow_search': False, 'filter_by_taxa': True, 'requiring_taxa': 'Arthropoda', 'requiring_relax': '0', 'abundance_pattern': 'abun\\=([0-9]+\\.*[0-9]*)', 'min_abundance': '10', 'genetic_code': 'auto', 'clade': 'Arthropoda', 'skip_read_mapping': True}

2023-05-09 19:12:29,797 - mitoz.utility.utility - INFO - 
assemble.main() got updated args:
{'logger': <Logger mitoz.utility.utility (DEBUG)>, 'workdir': '/mnt/Data/local/mitoz3.6/test/mt_assembly', 'outprefix': 'ttt', 'thread_number': '8', 'fq1': '/mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R1.fq.gz', 'fq2': '/mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R2.fq.gz', 'insert_size': '250', 'fastq_read_length': '150', 'assembler': 'megahit', 'tmp_dir': None, 'kmers': [71], 'kmers_megahit': ['43', '71', '99'], 'kmers_spades': ['auto'], 'memory': '50', 'resume_assembly': False, 'profiles_dir': '/mnt/Data/local/mitoz3.6/lib/python3.8/site-packages/mitoz/profiles', 'slow_search': False, 'filter_by_taxa': True, 'requiring_taxa': 'Arthropoda', 'requiring_relax': '0', 'abundance_pattern': 'abun\\=([0-9]+\\.*[0-9]*)', 'min_abundance': '10', 'genetic_code': '5', 'clade': 'Arthropoda', 'skip_read_mapping': True}

2023-05-09 19:12:29,797 - mitoz.utility.utility - INFO - 
assemble.main() got
fq1: /mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R1.fq.gz
fq2: /mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R2.fq.gz

2023-05-09 19:12:29,798 - mitoz.utility.utility - INFO - 
assemble.main() working in /mnt/Data/local/mitoz3.6/test/mt_assembly

2023-05-09 19:12:29,798 - mitoz.utility.utility - INFO - 
use_megahit() workdir: /mnt/Data/local/mitoz3.6/test/mt_assembly

2023-05-09 19:12:29,798 - mitoz.utility.utility - INFO - 
use_megahit() creating: /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit

2023-05-09 19:12:29,799 - mitoz.utility.utility - INFO - 
use_megahit() expected assembly_file: /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/final.contigs.fa

2023-05-09 19:12:29,799 - mitoz.utility.utility - INFO - 
megahit  --out-dir ./megahit_out  --num-cpu-threads 8 --k-list 43,71,99 --memory 500000000000.0 -1 /mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R1.fq.gz -2 /mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R2.fq.gz
2023-05-09 19:12:29 - MEGAHIT v1.2.9
2023-05-09 19:12:30 - Using megahit_core without POPCNT and BMI2 support, because the features not detected by CPUID 
2023-05-09 19:12:30 - Convert reads to binary library
2023-05-09 19:12:30 - b'INFO  sequence/io/sequence_lib.cpp  :   77 - Lib 0 (/mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R1.fq.gz,/mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R2.fq.gz): pe, 23022 reads, 125 max length'
2023-05-09 19:12:30 - b'INFO  utils/utils.h                 :  152 - Real: 0.0570\tuser: 0.0372\tsys: 0.0181\tmaxrss: 6568'
2023-05-09 19:12:30 - Start assembly. Number of CPU threads 8 
2023-05-09 19:12:30 - k list: 43,71,99 
2023-05-09 19:12:30 - Memory used: 500000000000
2023-05-09 19:12:30 - Extract solid (k+1)-mers for k = 43 
2023-05-09 19:12:30 - Build graph for k = 43 
2023-05-09 19:12:31 - Assemble contigs from SdBG for k = 43
2023-05-09 19:12:32 - Error occurs, please refer to /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/log for detail
2023-05-09 19:12:32 - Command: /mnt/Data/local/mitoz3.6/bin/megahit_core_popcnt assemble -s /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/tmp/k43/43 -o /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/intermediate_contigs/k43 -t 8 --min_standalone 296 --prune_level 2 --merge_len 20 --merge_similar 0.95 --cleaning_rounds 5 --disconnect_ratio 0.1 --low_local_ratio 0.2 --cleaning_rounds 5 --min_depth 2 --bubble_level 2 --max_tip_len -1 --careful_bubble; Exit code -4

2023-05-09 19:12:32,230 - mitoz.utility.utility - ERROR - 
Error occured when running command:
megahit  --out-dir ./megahit_out  --num-cpu-threads 8 --k-list 43,71,99 --memory 500000000000.0 -1 /mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R1.fq.gz -2 /mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R2.fq.gz

2023-05-09 19:12:32,231 - mitoz.utility.utility - WARNING - 
Can not find assembly_file: /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/final.contigs.fa
Traceback (most recent call last):
  File "/mnt/Data/local/mitoz3.6/bin/mitoz", line 10, in <module>
    sys.exit(main())
  File "/mnt/Data/local/mitoz3.6/lib/python3.8/site-packages/mitoz/MitoZ.py", line 99, in main
    args.func(args)
  File "/mnt/Data/local/mitoz3.6/lib/python3.8/site-packages/mitoz/all/all.py", line 361, in main
    args.fastafiles, assemble_all_result_wdir = mitoz.assemble.main(assemble_args)
  File "/mnt/Data/local/mitoz3.6/lib/python3.8/site-packages/mitoz/assemble/assembly.py", line 521, in main
    logger.info('assemble.main() found resulting_mt_files:\n' + '\n'.join(resulting_mt_files))
TypeError: sequence item 0: expected str instance, NoneType found

And this is MEGAHIT's log:

2023-05-09 19:12:29 - MEGAHIT v1.2.9
2023-05-09 19:12:30 - Using megahit_core without POPCNT and BMI2 support, because the features not detected by CPUID 
2023-05-09 19:12:30 - Convert reads to binary library
2023-05-09 19:12:30 - command /mnt/Data/local/mitoz3.6/bin/megahit_core_popcnt buildlib /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/tmp/reads.lib /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/tmp/reads.lib
2023-05-09 19:12:30 - b'INFO  sequence/io/sequence_lib.cpp  :   77 - Lib 0 (/mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R1.fq.gz,/mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R2.fq.gz): pe, 23022 reads, 125 max length'
2023-05-09 19:12:30 - b'INFO  utils/utils.h                 :  152 - Real: 0.0570\tuser: 0.0372\tsys: 0.0181\tmaxrss: 6568'
2023-05-09 19:12:30 - Start assembly. Number of CPU threads 8 
2023-05-09 19:12:30 - k list: 43,71,99 
2023-05-09 19:12:30 - Memory used: 500000000000
2023-05-09 19:12:30 - Extract solid (k+1)-mers for k = 43 
2023-05-09 19:12:30 - command /mnt/Data/local/mitoz3.6/bin/megahit_core_popcnt count -k 43 -m 2 --host_mem 500000000000 --mem_flag 1 --output_prefix /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/tmp/k43/43 --num_cpu_threads 8 --read_lib_file /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/tmp/reads.lib
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  148 - Preparing data...'
2023-05-09 19:12:30 - b'INFO  sequence/io/sequence_lib.cpp  :  115 - Before reading, sizeof seq_package: 903032'
2023-05-09 19:12:30 - b'INFO  sequence/io/sequence_lib.cpp  :  117 - After reading, sizeof seq_package: 903032'
2023-05-09 19:12:30 - b'INFO  sorting/kmer_counter.cpp      :   76 - 23022 reads, 125 max read length'
2023-05-09 19:12:30 - b'INFO  sorting/kmer_counter.cpp      :   83 - 3 words per substring, 4 words per edge'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  153 - Preparing data... Done. Time elapsed: 0.0199'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  156 - Preparing partitions and calculating bucket sizes...'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :   80 - Minimum memory required: 6600912 bytes'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  138 - Lv1 items: 251401, Lv2 items: 54038'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  140 - Memory of derived class: 5304000, Memory for Lv1+Lv2: 2086364'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  168 - Preparing partitions and calculating bucket sizes... Done. Time elapsed: 0.0652'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  172 - Start main loop...'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  186 - Lv1 scanning from bucket 0 to 22'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  193 - Lv1 scanning done. Large diff: 0. Time elapsed: 0.0082'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  199 - Lv1 fetching & sorting done. Time elapsed: 0.0167'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  186 - Lv1 scanning from bucket 22 to 498'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  193 - Lv1 scanning done. Large diff: 0. Time elapsed: 0.0097'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  199 - Lv1 fetching & sorting done. Time elapsed: 0.0177'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  186 - Lv1 scanning from bucket 498 to 2592'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  193 - Lv1 scanning done. Large diff: 0. Time elapsed: 0.0123'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  199 - Lv1 fetching & sorting done. Time elapsed: 0.0234'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  186 - Lv1 scanning from bucket 2592 to 7618'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  193 - Lv1 scanning done. Large diff: 0. Time elapsed: 0.0190'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  199 - Lv1 fetching & sorting done. Time elapsed: 0.0294'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  186 - Lv1 scanning from bucket 7618 to 65536'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  193 - Lv1 scanning done. Large diff: 0. Time elapsed: 0.0436'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  199 - Lv1 fetching & sorting done. Time elapsed: 0.0598'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  204 - Main loop done. Time elapsed: 0.2416'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  207 - Postprocessing...'
2023-05-09 19:12:30 - b'INFO  sorting/kmer_counter.cpp      :  406 - Total number of candidate reads: 183 (1472)'
2023-05-09 19:12:30 - b'INFO  sorting/kmer_counter.cpp      :  408 - Total number of solid edges: 61052'
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  210 - Postprocess done. Time elapsed: 0.0632'
2023-05-09 19:12:30 - b'INFO  utils/utils.h                 :  152 - Real: 0.3934\tuser: 2.0294\tsys: 0.1180\tmaxrss: 26908'
2023-05-09 19:12:30 - Build graph for k = 43 
2023-05-09 19:12:30 - command /mnt/Data/local/mitoz3.6/bin/megahit_core_popcnt seq2sdbg --host_mem 500000000000 --mem_flag 1 --output_prefix /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/tmp/k43/43 --num_cpu_threads 8 -k 43 --kmer_from 0 --input_prefix /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/tmp/k43/43 --need_mercy
2023-05-09 19:12:30 - b'INFO  sorting/base_engine.cpp       :  148 - Preparing data...'
2023-05-09 19:12:30 - b'INFO  sorting/seq_to_sdbg.cpp       :  369 - Number edges: 61052'
2023-05-09 19:12:30 - b'INFO  sorting/seq_to_sdbg.cpp       :  415 - Bases to reserve: 3357860, number contigs: 0, number multiplicity: 76315'
2023-05-09 19:12:30 - b'INFO  sorting/seq_to_sdbg.cpp       :  422 - Before reading, sizeof seq_package: 839476, multiplicity vector: 76315'
2023-05-09 19:12:30 - b'INFO  sorting/seq_to_sdbg.cpp       :  428 - Read 61052 edges.'
2023-05-09 19:12:30 - b'INFO  sorting/seq_to_sdbg.cpp       :  433 - After reading, sizeof seq_package: 839476/61052/2686288, multiplicity vector: 61052/76315'
2023-05-09 19:12:30 - b'INFO  sorting/seq_to_sdbg.cpp       :  440 - Adding mercy edges...'
2023-05-09 19:12:31 - b'INFO  sorting/seq_to_sdbg.cpp       :  192 - Read 183 reads to search for mercy k-mers'
2023-05-09 19:12:31 - b'INFO  sorting/seq_to_sdbg.cpp       :  356 - Number of reads: 183, Number of mercy edges: 3379'
2023-05-09 19:12:31 - b'INFO  sorting/seq_to_sdbg.cpp       :  444 - Done. Time elapsed: 0.7813'
2023-05-09 19:12:31 - b'INFO  sorting/seq_to_sdbg.cpp       :  449 - After adding mercy, sizeof seq_package: 839476/64431/2834964, multiplicity vector: 64431/76315'
2023-05-09 19:12:31 - b'INFO  sorting/seq_to_sdbg.cpp       :  508 - Finally, sizeof seq_package: 839476/64431/2834964, multiplicity vector: 64431/76315'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  153 - Preparing data... Done. Time elapsed: 0.9719'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  156 - Preparing partitions and calculating bucket sizes...'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :   80 - Minimum memory required: 1236802 bytes'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  138 - Lv1 items: 51544, Lv2 items: 11126'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  140 - Memory of derived class: 1014282, Memory for Lv1+Lv2: 384192'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  168 - Preparing partitions and calculating bucket sizes... Done. Time elapsed: 0.0687'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  172 - Start main loop...'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  186 - Lv1 scanning from bucket 0 to 4'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  193 - Lv1 scanning done. Large diff: 0. Time elapsed: 0.0020'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  199 - Lv1 fetching & sorting done. Time elapsed: 0.0047'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  186 - Lv1 scanning from bucket 4 to 510'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  193 - Lv1 scanning done. Large diff: 0. Time elapsed: 0.0027'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  199 - Lv1 fetching & sorting done. Time elapsed: 0.0041'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  186 - Lv1 scanning from bucket 510 to 3124'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  193 - Lv1 scanning done. Large diff: 0. Time elapsed: 0.0047'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  199 - Lv1 fetching & sorting done. Time elapsed: 0.0067'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  186 - Lv1 scanning from bucket 3124 to 10362'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  193 - Lv1 scanning done. Large diff: 0. Time elapsed: 0.0084'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  199 - Lv1 fetching & sorting done. Time elapsed: 0.0109'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  186 - Lv1 scanning from bucket 10362 to 32516'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  193 - Lv1 scanning done. Large diff: 0. Time elapsed: 0.0139'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  199 - Lv1 fetching & sorting done. Time elapsed: 0.0364'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  186 - Lv1 scanning from bucket 32516 to 65536'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  193 - Lv1 scanning done. Large diff: 0. Time elapsed: 0.0112'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  199 - Lv1 fetching & sorting done. Time elapsed: 0.0062'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  204 - Main loop done. Time elapsed: 0.1137'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  207 - Postprocessing...'
2023-05-09 19:12:31 - b'INFO  sorting/seq_to_sdbg.cpp       :  793 - Number of $ A C G T A- C- G- T-:'
2023-05-09 19:12:31 - b'INFO  sorting/seq_to_sdbg.cpp       :  794 - 1805 47313 16358 16592 49687 259 62 74 322'
2023-05-09 19:12:31 - b'INFO  sorting/seq_to_sdbg.cpp       :  800 - Total number of edges: 132472'
2023-05-09 19:12:31 - b'INFO  sorting/seq_to_sdbg.cpp       :  801 - Total number of ONEs: 129950'
2023-05-09 19:12:31 - b'INFO  sorting/seq_to_sdbg.cpp       :  803 - Total number of $v edges: 1805'
2023-05-09 19:12:31 - b'INFO  sorting/base_engine.cpp       :  210 - Postprocess done. Time elapsed: 0.2154'
2023-05-09 19:12:31 - b'INFO  utils/utils.h                 :  152 - Real: 1.3731\tuser: 1.7572\tsys: 0.4470\tmaxrss: 270016'
2023-05-09 19:12:31 - Assemble contigs from SdBG for k = 43
2023-05-09 19:12:31 - command /mnt/Data/local/mitoz3.6/bin/megahit_core_popcnt assemble -s /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/tmp/k43/43 -o /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/intermediate_contigs/k43 -t 8 --min_standalone 296 --prune_level 2 --merge_len 20 --merge_similar 0.95 --cleaning_rounds 5 --disconnect_ratio 0.1 --low_local_ratio 0.2 --cleaning_rounds 5 --min_depth 2 --bubble_level 2 --max_tip_len -1 --careful_bubble
2023-05-09 19:12:32 - b'INFO  main_assemble.cpp             :  129 - Loading succinct de Bruijn graph: /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/tmp/k43/43'
2023-05-09 19:12:32 - Error occurs, please refer to /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/log for detail
2023-05-09 19:12:32 - Command: /mnt/Data/local/mitoz3.6/bin/megahit_core_popcnt assemble -s /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/tmp/k43/43 -o /mnt/Data/local/mitoz3.6/test/mt_assembly/megahit/megahit_out/intermediate_contigs/k43 -t 8 --min_standalone 296 --prune_level 2 --merge_len 20 --merge_similar 0.95 --cleaning_rounds 5 --disconnect_ratio 0.1 --low_local_ratio 0.2 --cleaning_rounds 5 --min_depth 2 --bubble_level 2 --max_tip_len -1 --careful_bubble; Exit code -4

Any help will be greatly appreciated!

Cheers!

linzhi2013 commented 1 year ago

Hi there,

I just tested MitoZ 3.6 with the test data (https://github.com/linzhi2013/MitoZ/wiki/Installation#9-running-the-test-dataset) on my Linux server, and it worked.

Probably it's the problem of Megahit. You may need to install Megahit by yourself and do a test run to check if your installation works or not, e.g., using the following command:

/path/to/your/own/megahit  --out-dir ./megahit_out  --num-cpu-threads 8 --k-list 43,71,99 --memory 500000000000.0 -1 /mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R1.fq.gz -2 /mnt/Data/local/mitoz3.6/test/clean_data/ttt.clean_R2.fq.gz

If that's the case, replace the megahit-related binaries in /mnt/Data/local/mitoz3.6/bin/ with your newly installed ones.

marcosbarbeitos commented 1 year ago

Hello, it was indeed an issue with megahit, but it is not solvable by installation from source. My machine does not support POPCNT or BMI2, which seemingly leads megahit to use the following command line:

megahit_core_popcnt assemble -s /mnt/Data/local/mitoz3.6/megahit_out/tmp/k43/43 -o /mnt/Data/local/mitoz3.6/megahit_out/intermediate_contigs/k43 -t 8 --min_standalone 296 --prune_level 2 --merge_len 20 --merge_similar 0.95 --cleaning_rounds 5 --disconnect_ratio 0.1 --low_local_ratio 0.2 --cleaning_rounds 5 --min_depth 2 --bubble_level 2 --max_tip_len -1 --careful_bubble

This, however, does not work either.

After trying MitoFlex, I found out that by using megahit_core_no_hw_accel instead of megahit_core_popcnt I can get megahit to do its part.

I wonder if it would be possible to resume the assembly from this point by using a MitoZ option other that all.

I imagine I'd also have to run megahit for k=71 and k = 99.

Thanks in advance!

linzhi2013 commented 1 year ago

If you want to continue the all command after your own try of the megahit command, you can add --resume_assembly option.

    if resume:
        command = "megahit --continue --out-dir ./megahit_out "
        if tmp_dir:
            command += " --tmp-dir " + tmp_dir
        runcmd(command, logger=logger)

this is because megahit supports this (--continue).

If you don't want to do that, you can find the mt_assembly/megahit/megahit_out/final.contigs.fa file, and search for mitogenome from it. It has something like this:

>k99_363402 flag=1 multi=2.0000 len=304

If you want to use the findmitoscaf command, it's better for you to reformat the mt_assembly/megahit/megahit_out/final.contigs.fa file to this:

>k99_363402 abun=2.0000