treangenlab / emu

MIT License
31 stars 4 forks source link

Error with custom database #13

Closed rjain1990 closed 1 month ago

rjain1990 commented 1 month ago

Hi, I am trying to run Emu with a custom Marjaam database. I could successfully built the custom database and then while running I am getting the below error. Not sure is something worng with the database or something else. Can you please help?

Processing /zfs/omics/personal/rjain/AML1_2/filtered/barcode95_filtered.fastq [M::mm_idx_gen::0.0061.83] collected minimizers [M::mm_idx_gen::0.0093.89] sorted minimizers [M::main::0.0093.87] loaded/built the index for 383 target sequence(s) [M::mm_mapopt_update::0.0103.70] mid_occ = 357 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 383 [M::mm_idx_stat::0.0103.63] distinct minimizers: 5960 (67.67% are singletons); average occurrences: 5.438; average spacing: 5.787; total length: 187553 [M::worker_pipeline::0.09714.26] mapped 71 sequences [M::main] Version: 2.26-r1175 [M::main] CMD: minimap2 -ax map-ont -t 24 -N 50 -p .9 -K 500000000 -o ./Results-emu/barcode95_filtered_emu_alignments.sam /zfs/omics/personal/rjain/emu_maarjam/marjaam/species_taxid.fasta /zfs/omics/personal/rjain/AML1_2/filtered/barcode95_filtered.fastq [M::main] Real time: 0.099 sec; CPU: 1.381 sec; Peak RSS: 0.053 GB Traceback (most recent call last): File "/home/rjain/miniconda3/envs/py37/bin/emu", line 804, in SAM_FILE, log_prob_cigar_op, longest_align_dict, locs_p_cigar_zero) File "/home/rjain/miniconda3/envs/py37/bin/emu", line 195, in log_prob_rgs_dict dict_longest_align, align_len) File "/home/rjain/miniconda3/envs/py37/bin/emu", line 167, in compute_log_prob_rgs species_tid = int(ref_name.split(":")[0]) ValueError: invalid literal for int() with base 10: 'Y17635' Processing /zfs/omics/personal/rjain/AML1_2/filtered/barcode96_filtered.fastq [M::mm_idx_gen::0.0071.90] collected minimizers [M::mm_idx_gen::0.0093.76] sorted minimizers [M::main::0.0093.75] loaded/built the index for 383 target sequence(s) [M::mm_mapopt_update::0.0103.58] mid_occ = 357 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 383 [M::mm_idx_stat::0.0103.52] distinct minimizers: 5960 (67.67% are singletons); average occurrences: 5.438; average spacing: 5.787; total length: 187553 [M::worker_pipeline::0.80521.47] mapped 1113 sequences [M::main] Version: 2.26-r1175 [M::main] CMD: minimap2 -ax map-ont -t 24 -N 50 -p .9 -K 500000000 -o ./Results-emu/barcode96_filtered_emu_alignments.sam /zfs/omics/personal/rjain/emu_maarjam/marjaam/species_taxid.fasta /zfs/omics/personal/rjain/AML1_2/filtered/barcode96_filtered.fastq [M::main] Real time: 0.808 sec; CPU: 17.295 sec; Peak RSS: 0.070 GB Traceback (most recent call last): File "/home/rjain/miniconda3/envs/py37/bin/emu", line 804, in SAM_FILE, log_prob_cigar_op, longest_align_dict, locs_p_cigar_zero) File "/home/rjain/miniconda3/envs/py37/bin/emu", line 195, in log_prob_rgs_dict dict_longest_align, align_len) File "/home/rjain/miniconda3/envs/py37/bin/emu", line 167, in compute_log_prob_rgs species_tid = int(ref_name.split(":")[0]) ValueError: invalid literal for int() with base 10: 'AM295493'

Thanks tons!

rjain1990 commented 1 month ago

Y17635; AM295493... these seems to be the tax_id in my database.

rjain1990 commented 1 month ago

I managed to solve it by replacing non-integer tax_ID with unique integer values.